Computer Vision: A Complete beginner's Guide

Computer Vision: A Complete beginner's Guide

Introduction:

Computer vision is the ability of computers to recognize and extract data/information from objects in images, videos, and real-life events. Unlike humans, computers have a hard time processing visual data. While we can interpret what we perceive depends on our memories and prior experiences, computers cannot. To bridge the gap between what they see and understand, computers employ artificial intelligence (AI), neural networks, deep learning (DL), parallel computing, and machine learning (ML).

This article will explore computer vision technology, the algorithms at play, the types of computer vision techniques, and more.

Working:

Image Source: link

Instead of looking at an entire image like we do, a computer divides it into pixels. It uses the RGB values ​​of each pixel to understand if the image contains essential features. Computer vision algorithms focus on one-pixel blob at a time and use a kernel or filter that has pixel multiplication values ​​for edge detection of objects. The computer recognizes and distinguishes the image by observing all aspects of it including colors, shadows, and line drawings.

Today, we use convolutional neural networks (CNNs) for modeling and training. CNN's are special neural networks, specifically designed for processing pixel data, used for image recognition and processing. The convolutional layer contains multiple neurons and tensors. These process large datasets by learning to adjust their values ​​to match characteristics that are important for distinguishing different classes. This is done by extensively training the model.

One way to help computers learn pattern recognition is to feed them numerous labeled images so that they can look for patterns in all the elements.

For example, if you feed a million pictures of a ‘lion’ to a computer, they will go through algorithms that analyze the color, shape, distance between shapes, boundaries between objects, etc., so that they become profiles. The computer can then use the experience when fed other unlabeled images to know whether an image shown is that of a lion.

Here’s a real-life scenario for better understanding.

Image source: link

It’s evident that the image above is represented by different blocks. These blocks are called pixels. Note that we represent any image’s dimension as X Y. This means that the image has a total of XY pixels.

Each pixel’s brightness is represented by a single 8-bit number, whose range is from 0 (black) to 255 (white):

The computer does not store the image as we see it, but does so in the form shown below:

{157, 153, 174, 168, 150, 152, 129, 151, 172, 161, 155, 156, 155, 182, 163, 74, 75, 62, 33, 17, 110, 210, 180, 154, 180, 180, 50, 14, 34, 6, 10, 33, 48, 106, 159, 181, 206, 109, 5, 124, 131, 111, 120, 204, 166, 15, 56, 180, 194, 68, 137, 251, 237, 239, 239, 228, 227, 87, 71, 201, 172, 105, 207, 233, 233, 214, 220, 239, 228, 98, 74, 206, 188, 88, 179, 209, 185, 215, 211, 158, 139, 75, 20, 169, 189, 97, 165, 84, 10, 168, 134, 11, 31, 62, 22, 148, 199, 168, 191, 193, 158, 227, 178, 143, 182, 106, 36, 190, 205, 174, 155, 252, 236, 231, 149, 178, 228, 43, 95, 234, 190, 216, 116, 149, 236, 187, 86, 150, 79, 38, 218, 241, 190, 224, 147, 108, 227, 210, 127, 102, 36, 101, 255, 224, 190, 214, 173, 66, 103, 143, 96, 50, 2, 109, 249, 215, 187, 196, 235, 75, 1, 81, 47, 0, 6, 217, 255, 211, 183, 202, 237, 145, 0, 0, 12, 108, 200, 138, 243, 236, 195, 206, 123, 207, 177, 121, 123, 200, 175, 13, 96, 218};

Things get a bit complex with colored images as we have to represent the values in RGB format, i.e., each pixel is represented by three different values. For example, if the image dimension is 1612, we would need a total of 1612*3 values in order to represent the RGB values. Any algorithmic model has to iterate over each of these pixels many times to get trained successfully. We would need thousands of images to efficiently train a model for a particular case.

Applications of Computer Vision :

Applications of computer vision.webp

Image Source: link

Deep learning is used in the majority of modern computer vision applications including facial recognition, self-driving vehicles, and cancer diagnosis, to name a few.

Facial recognition

As discussed, computer vision is extensively used in facial recognition systems, thanks to its ability to find patterns in elements in the data. It then makes recommendations or takes action based on the data.

Self-driving vehicles

Computer vision enables autonomous vehicles to gain a sense of their surroundings by creating 3D maps out of real-time images. Cameras capture video from different angles around a car and feed it to computer vision software. It processes it to identify the extremities of roads, browse traffic signs, and discover alternative cars, objects, and pedestrians. The car can then steer its approach on streets and highways, avoid obstacles, and drive its passengers to their destination.

Healthcare

Computer vision has been used in a variety of healthcare applications to help healthcare professionals make better decisions related to patient care. Medical imaging or medical imaging analysis is one such procedure. It creates visualizations of specific organs or tissues to enable a more accurate diagnosis.

Augmented reality (AR)

Computer vision is implemented in augmented reality to extend imagery and sound to real-world environments. It detects real-life objects through the lens of, say, a smartphone, and performs computational operations. Among other things, we can use AR to measure the height of a table merely by using a smartphone’s camera.

Super-resolution imaging (SR)

SR is a technique that enhances the resolution of images. It is achieved with the help of four methods or algorithms: enhanced deep super-resolution network (EDSR), efficient sub-pixel convolutional neural network (ESPCNN), fast super-resolution convolutional neural network (FSRCNN), and Laplacian pyramid super-resolution network (LapSRN). These pre-trained models can easily be downloaded and used.

In super-resolution imaging, the model interprets numerous low-quality images differently, leading to the treatment of all the images as having unique information. Once the variations between the photographs are analyzed, the model produces a stream of images of significantly higher quality.

Optical character recognition (OCR)

OCR extracts text from images, scanned documents, and image-only PDFs. It identifies letters and puts them into words and sentences. The text is read using various contouring and thresholding techniques. Libraries like OpenCV are commonly used.

OCR technology is widely used to digitize text, scan passports for automatic check-in, evaluate customer data, etc.

Techniques of computer vision

Computer vision comprises various techniques such as semantic segmentation, localization, object detection, instance segmentation, etc. They can be applied to calculate the speed of an object in a video, create a 3D model of a particular scenario that has been inputted, and remove noise from an image, such as excessive blurring.

Semantic segmentation

Semantic segmentation groups pixels together and classifies and labels them. This helps determine if a particular pixel belongs to a particular object class. For example, it is used to identify if that pixel is from an image of a cat or a dog. It identifies the image label (in this case, either a cat or a dog).

Localization

In localization, an image is given a label that corresponds to the parent object. The object is located and a bounding box is drawn around it. This acts as a point of reference for object detection.

Object detection

Object detection is a method to find occurrences of real-world objects such as faces, bikes, and buildings in images and videos. Learning algorithms are used to identify the instances of the objects.

Instance segmentation

Instance segmentation detects and identifies instances from the above processes, and gives a particular label to the pixels. It is often used in the real world, such as in self-driving cars, smart farming, and medical imaging.

In this article, we discussed how computer vision works, the techniques used, the applications, and more. While there have been remarkable advances in this field over the years, challenges remain such as data quality, hardware limitations, optimizing deep learning models, etc. However, the demand for computer vision, ongoing research, and evolving technologies will continue to improve in the years to come.

Parting Notes :

In this article, we understood the basics of Computer Vision, the stages involved in a project with Computer Vision, and its applications.

Thanks for reading…

If you liked the article, do share it with your friends too.

Have a good day..!!

Please feel free to connect with me through my socials. I always love to have a chat with similarly-minded people.

Linked-in

Mail

You can also follow me on Twitter.