DEV Community

Couresesteach
Couresesteach

Posted on

Introduction to Computer Vision: How Machines See the World

📑 Table of Contents

1- Introduction

This is a course on computer vision. It's aimed at covering the foundational aspects of how to analyze images and to extract content from images. That is, how can we build a computer or a machine that can see and interpret an image. First what do I mean by foundational? I mean that we are going to cover the mathematical and computational methods to provide you with core concepts of how can a computer be built to interpret images. Notice I am using the word interpret. In Computer Vision we are interested in extracting information, knowledge from an image. Many want to go beyond processing an image to really knowing what is inside the image, what's the content of the image. So we will learn the math and the basic concepts how to compute with an image and extract information from it.

Difference between CV and CP

What is the difference between these two classes and the material covered in it? There is indeed some overlap between the classes, especially in the initial few modules where we learn about computing with images and extracting information from images.

Computational photography is really about capturing a light from a scene to record a scene into a photograph or such other related novel artifact that showcases the scene. Image analysis is done to support the capture and display of the scene in novel ways. Some of it's actually about building newer forms of cameras and softwares to facilitate that process. Computer vision is really about interpreting an analysis of the scene. That is what is the content of the image of the scene, who is in there, what is in the image and what is happening.

2- What is Computer Vision

Definition: Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, machines can accurately identify and classify objects — and then react to what they “see.”

Defination1: “Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do”

Definition 2 :“Computer Vision is just a field of AI that enables computers or machines to see and understand the world and the things in it”

Definition 3-GPT: Computer vision is a field of study in computer science and artificial intelligence that focuses on enabling computers to interpret and understand visual data from the world around us. It involves developing algorithms and techniques that allow computers to analyze and make sense of images and videos, just like humans do.

Computer Vision is really about analyzing images and videos to extract knowledge from them. Mostly these images are of real scenes like that of a street image with cars and such where autonomous vehicle they'll have to navigate through or it could be other types of images like that of an X-ray inside a human head and we need to do image analysis to be able to extract things about of interest in medical applications.

Computer vision is the field of computer science that focuses on creating digital systems that can process, analyze, and make sense of visual data (images, videos) in the same way that humans do. Computer Vision uses convolutional neural networks to process visual data at the pixel level and deep learning recurrent neural networks to understand how one pixel relates to another [2] So essentially the goal is image and video understanding which means labeling interesting things in an image and also tracking them as they move.

Well, there's a couple of ways of thinking about it. I like this slide that I borrowed from Steve Seitz, where he talks about every picture tells a story. And one way of thinking about computer vision is the goal is to interpret images. That is, say something about what's present in the scene or what's actually going on. So, what we're doing is, we're going to take images in, and what's going to come out is something that has some meaning to it. That is, we're going to extract, we're going to create some sort of interpretations, some sort of an understanding of what that image is representative of. This is different, many of you may have some exposure to image processing, which is the manipulation of images. That's images in and images out. And we'll talk a little bit about that because you use image processing for per, for computer vision. But fundamentally computer vision is about understanding something that's in the image.
9

Why Study Computer Vision

Computer vision is an art with the help of which you are giving the computer to understand the visual world. There are many real-world life examples of computer vision that we use in our day-to-day life.
But there’s actually some really good reasons to do that. These days images and imagery have become ubiquitous in all of our technology. cameras, video, you can stream them, you can send them etc. So what's become fundamental to an awful lot of systems is the manipulation in the processing of imagery. And extracting information from that. There are domains such as surveillance. There are building 3D models for medical imaging. Or capturing for motion capture. These are all different current industries that leverage computer vision in a variety of ways. When you are clicking a selfie why is there a small square on your face, When you scan a document you get edges of the documents already detected, how do streamers change their backgrounds, and how did tesla create self-driving cars? It was all thanks to advancements in Computer Vision that it made all these things were possible [2].

But most of all, the reason to do it is, it is just a really cool and deep set of problems. And it's way more fun than learning how to build compilers. And now, I have to go apologize to all my compiler friends, but they know it's true.

History of computer vision

1950s,

Early experiments in computer vision took place in the 1950s, using some of the first neural networks to detect the edges of an object and to sort simple objects into categories like circles and squares [1].

1970s

In the 1970s, the first commercial use of computer vision interpreted typed or handwritten text using optical character recognition. This advancement was used to interpret written text for the blind.

1990s

As the internet matured in the 1990s, making large sets of images available online for analysis, facial recognition programs flourished. These growing data sets helped make it possible for machines to identify specific people in photos and videos.

3-How does computer vision work?

Computer vision technology tends to mimic the way the human brain works. But how does our brain solve visual object recognition? One of the popular hypothesis states that our brains rely on patterns to decode individual objects. This concept is used to create computer vision systems [5].Computer vision algorithms that we use today are based on pattern recognition. We train computers on a massive amount of visual data — computers process images, label objects on them, and find patterns in those objects. For example, if we send a million images of flowers, the computer will analyze them, identify patterns that are similar to all flowers and, at the end of this process, will create a model “flower.” As a result, the computer will be able to accurately detect whether a particular image is a flower every time we send them pictures.
Computer vision works in three basic steps:

1- Acquiring an image

Images, even large sets, can be acquired in real-time through video, photos or 3D technology for analysis.

2- Processing the image

Deep learning models automate much of this process, but the models are often trained by first being fed thousands of labeled or pre-identified images. Computer vision algorithms are based on pattern recognition. We train our model on a massive amount of visual(images) data. Our model processes the images with label and find patterns in those objects(images).

3- Understanding the image

The final step is the interpretative step, where an object is identified or classified.

Real life Example

For example, If we send a million pictures of vegetable images to a model to train, it will analyze them and create an Engine (Computer Vision Model) based on patterns that are similar to all vegetables. As a result, Our Model will be able to accurately detect whether a particular image is a Vegetables every time we send it .

References

1-What is Computer Vision? & Its Applications

2--Introduction of Computer Vision

4-How computer vision works

5-Computer Vision 🤖 Fundamentals with OpenCV

6-Computer Vision

7-Computer Vision Tutorial Series M1C1

0-Computer Vision SAS

🎯 Call to Action (CTA)

Ready to take your NLP skills to the next level?

✅ Enroll in our Full Course Computer Vision for an in-depth learning experience. (Note: If the link doesn't work, please create an account first and then click the link again.)
📬 Subscribe to our newsletter for weekly ML/NLP /Computer Visitation tutorials
⭐ Follow our GitHub repository for project updates and real-world implementations

🎁 Access exclusive Machine learning bundles and premium guides on our Gumroad store: From sentiment analysis notebooks to fine-tuning transformers—download, learn, and implement faster.

Top comments (0)