DEV Community

loading...
Cover image for Towards Learning Computer Vision

Towards Learning Computer Vision

naruaika profile image Naufan Rusyda Faikar ・6 min read

Today, I am going to write a little story about how I had started studying computer vision through my curiosities in the days of yesterday. Please note, this is not even an introduction, but rather a random small talk between us. Hopefully you can pick up some insights and keywords from here.

What Makes Me Interested

One thing that deserves to be questioned about both animated and live-action films that feature robots as figures in the story is that they all have eyes, usually there are two.

Alt Text

The figure above (credit to awn.com) shows Baymax, a main robot figure in Big Hero 6 (2014) who is depicted as an inflatable robot with a carbon fiber bones acting as a companion to personal health. In the film, it is told that Baymax always asks questions, diagnoses biometric data and even provides medical solutions for any changes in the behavior of the patient, which he sees through "two eyes", for example when Hiro Hamada got a small wound from an accident or when Hiro was looking gloomy over the death of his brother, Tadashi Hamada.

Alt Text

While the image above (credit to jonnegroni.com) shows EVE in WALL-E, a different robot from the previous one, having no eyes on its visible physical form. It is possible that the camera sensor remains, embedded on the glass screen like the front camera technology on many flagship smartphones today. So, what are they? Why do they, the robots, need this one sight function? How important is that function to them? And why do they, following primates, need two eyes?

A Robot is a machine that is made to look like a human and that can do some things that a human can do—Oxford Learner's Dictionaries

The main objective is to assist or even replace humans in performing their duties. In general, robots are used for reasons able to work without emotions—such as sulking, sadness, shame, and fear—, boredom, drowsiness, fatigue and the desire to argue against orders (Breazeal, 1998). The benefits of robots are not only in the manufacturing industry, but also in education—e.g. home robots as an alternative to interactive learning programs for children (Han et al., 2005)—, medicine—e.g. in the treatment of autism spectrum disorders (Diehl et al., 2012)—and others.

In both examples, the robots have the ability to respond, including making pseudo-facial expressions, in their interactions with humans. In two-way communication, for us, eye contact is an effective way to control situations such as when starting a conversation (Miyauchi et al., 2004). Thus, to support the work such that, it is customary to think that even robots need eyes as humans need them to carry out their tasks. But the eyes of the robot's behavior does not have to exactly match the human eye, it is enough for the robot to appear reliable and trustworthy (Lehmann et al., 2017).

Alt Text

We may have seen a dust cleaning robot named Roomba as shown in the figure above (credit to youtube.com) which is stuck in the corner of the room. This robot does not have the functionality to see, so it is unable to interpret the state of its surroundings. Hence, he could never have thought that only walking back and turning would solve this impasse.

To see is to become aware of somebody or something by using your eyes—Oxford Learner's Dictionaries

By seeing, a system will be able to make the right decisions regarding actions which need to be taken and which should be avoided in the future under the influence of the environment. Including decisions about where, when, how it is executed, and who will be involved in such action. In that case, if the robot is equipped with a sight function, it will be able to decide which direction and how far he had to move. Therefore, visual intelligence is often referred to as the foundation of intelligence, especially in the context of artificial intelligence (McCarthy et al., 2006; Bhat & Freksa, 2015).

Having two senses of sight—commonly referred to as binocular—is certainly preferable to just one—so-called ocular/monocular—. With two eyes and normal binocular vision, a visual system can increase the horizontal field of view and increase the 10% ability to distinguish fine details of objects from receiving relatively more information signals than a monocular system (Ciuffreda & Engber, 2002).

It should be noted that binoculars are not the same as double vision. Primates and other mammals with eyes facing forward only experience single vision. In double vision, an object will appear double. Whereas in binocular fusion, an object will only appear once (Nityananda & Read, 2017).

What I Have Found

The history of computer vision dates back to July 7, 1966 through a memo written in a laboratory at the Massachusetts Institute of Technology (MIT). The memo describes their plan to hire several MIT scholars for one summer independently to contribute to the construction of a significant part of the visual system, pattern recognition (Szeliski, 2011)⁠.

As we all know already, computers have no sense, making the job of adding visual functions to machines very challenging. Basically, what humans see is different from what computers see. For computers, a colour image is usually represented by a three-dimensional integer array. Respective two-dimensional array for red, green and blue colour channels—on the RGB system or depending on the colour system used—.

Alt Text

To recognize an image of a puppy, we only need to look at the shape of the muzzle, tongue, fur, tail and other unique features. However, the computer will only read them as arrays of independent integers. But just so, the computer can make miracles. In fact, almost every aspect of human life has long been affected by computer vision!

As some examples, I am here providing you some interesting videos related to the sucessful applications of computer vision today.

I have found that doctor Károly has been discussing a lot about research related to computer vision, so that we can follow the current developments by visiting his channel, Two Minute Papers.

Acknowledgement

Honestly, this discussion is merely a further finding after watching the recorded presentations below. Try to watch it, she was very good at explaining. I am pretty sure you will find your interest.

References

Pardon me, I did not have the opportunity to write down all the citations for our discussion yesterday. Hopefully you could understand.

  • Bhatt, M., & Freksa, C. (2015). Spatial computing for design—an artificial intelligence perspective. In Studying visual and spatial reasoning for design creativity (pp. 109-127). Springer, Dordrecht.
  • Breazeal, C. (1998, July). A motivational system for regulating human-robot interaction. In Aaai/iaai (pp. 54-61).
  • Ciuffreda, K. J., & Engber, K. (2002). Is one eye better than two when viewing pictorial art?. Leonardo, 35(1), 37-40.
  • Diehl, J. J., Schmitt, L. M., Villano, M., & Crowell, C. R. (2012). The clinical use of robots for individuals with autism spectrum disorders: A critical review. Research in autism spectrum disorders, 6(1), 249-262.
  • Han, J., Jo, M., Park, S., & Kim, S. (2005, August). The educational use of home robots for children. In ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005. (pp. 378-383). IEEE.
  • Lehmann, H., Keller, I., Ahmadzadeh, R., & Broz, F. (2017, November). Naturalistic Conversational Gaze Control for Humanoid Robots-A First Step. In International Conference on Social Robotics (pp. 526-535). Springer, Cham.
  • McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (2006). A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine, 27(4), 12-12.
  • Miyauchi, D., Sakurai, A., Nakamura, A., & Kuno, Y. (2004, April). Active eye contact for human-robot communication. In CHI'04 Extended Abstracts on Human Factors in Computing Systems (pp. 1099-1102).
  • Nityananda, V., & Read, J. C. (2017). Stereopsis in animals: evolution, function and mechanisms. Journal of Experimental Biology, 220(14), 2502-2512.
  • Szeliski, R. (2010). Computer vision: algorithms and applications. Springer Science & Business Media.

Discussion

pic
Editor guide