News That Matters

Computer Vision: Understanding How it Works

Computer Vision is an artificial intelligence technology that allows machines to imitate human vision. 

For several years now, we have entered the era of the image. Our smartphones are equipped with high-definition cameras, and we constantly capture photos and videos that we share with the world on social networks. The use of image recognition AI is a sub-branch of computer vision.

Video hosting services like YouTube are experiencing explosive popularity, and hundreds of hours of videos are uploaded every minute and watched. Thus, the internet is now made up of both text and images.

However, while it is relatively easy to index the texts and crawl them with search engines such as Google, the task is much more difficult for images. To index them and allow them to be browsed, algorithms need to know their content.

For a very long time, the only way to present the content of an image to computers was to fill in its meta description when uploading. Now, thanks to “Computer Vision” technology, machines can “see” images and understand their content.

What is Computer Vision?

Computer Vision can be described as a field of research aimed at enabling computers to see. Concretely, the idea is to transmit information about the real world to a machine from the data of an observed image.

For the human brain, vision is natural. Even a child can describe the contents of a photo, summarize a video or recognize a face after seeing them only once. The purpose of computer vision is to transmit this human ability to computers.

It is a vast multidisciplinary field that can be considered a branch of artificial intelligence and Machine Learning. However, it is also possible to use specialized methods and general learning algorithms that are not necessarily related to artificial intelligence.

Many techniques from different fields of science and engineering can be exploited. Some vision tasks can be accomplished using a relatively simple statistical method. Others will require large sets of complex machine learning algorithms.

How Computer Vision Works

Computer vision algorithms are based on “pattern recognition.” Computers are trained on vast amounts of visual data. They process images, label objects, and find patterns in those objects.

For example, if you feed a machine with a million flower photos, it will analyze them and detect patterns common to all flowers. It will then create a model and subsequently recognize a flower each time it sees an image with one.

Computer vision algorithms rely on neural networks, which are supposed to mimic the workings of the human brain. However, we do not yet know precisely how the brain and the eyes process images. It is therefore difficult to know to what extent Computer Vision algorithms mimic this biological process.

The machines interpret the images in a very simple way. They perceive them as series of pixels, each with its own set of numeric values ​​corresponding to the colors. Therefore, an image is perceived as a grid made up of pixels, each of which can be represented by a number generally between 0 and 255.

Things get complicated for color images. Computers read colors as a series of three values: red, green, and blue. Again, the scale ranges from 0 to 255. So every pixel in a color image has three values ​​that the computer must record in addition to its position.

Each color value is stored in 8 bits. This number is multiplied by three for a color image, equivalent to 24 bits per pixel. For an image of 1024 × 768 pixels, it is, therefore, necessary to count 24 bits per pixel, or almost 19 million bits or 2.36 megabytes.

You will understand: it takes a lot of memory to store an image. The Computer Vision algorithm, on the other hand, must cover a large number of pixels for each image. However, it generally takes several tens of thousands of photos to train a deep learning model.

This is why computer vision is a complex discipline, requiring colossal computing power and storage capacity to train models. This is why it took many years for IT to develop and allow Computer Vision to take off.