Object Detection in Real-Time

Computer Vision is also one of those areas of deep learning that is advancing dramatically. Computer Vision has, in fact, given us the power to make cars drive themselves and recognize pedestrians, lanes, trees, and so on. We all have come across the so-called face lock that we use in our phones and laptops to make our devices more secure. However, It is just another example of Computer Vision called Facial Recognition. We can notice that the face recognition systems are working far better than before, and now we can use the power to make our lockers, devices, and even houses safe using Computer Vision. And probably you may not know that some companies are also using deep learning to show us relevant pictures that matter to you the most. For example, you may see a Tesla's Cybertruck if you are interested in Tesla Motor's advanced autonomous AI-powered Cars. Some of you might get shocked by what I say next! We have advanced our knowledge of AI to a substantially higher level, and now, we can also enable an AI system to create new types of art by itself, thanks to Deep Learning. Why is Computer Vision one of the most exciting aspects of Deep Learning for me, and Why you might be too? It's because the rapid advancement of computer vision, nowadays, has brought lots of new applications into action that we couldn't have done before. And probably if you learn Computer Vision, you might end up with a brand new application or invention in Computer Vision, and if you would want to do, best wishes from my side, and I hope you come up with a cool project in CV.

You might have used Google Lens on your android devices, which is absolutely a mind-bending application of computer vision in real-life situations. It can search your homework questions directly from the captured image without you being typing that. That's called OCR or, in its expanded form, Optical Character Recognition. Now that's not the only thing Google Lens does. It also does plenty of other things, like, Image Recognition(to figure out what's inside the image), or translating the text inside the image, and overlaying translated text over the original text. One of the cool features that I like with Google Lens is that if you show it a QR code, it will find out if it's a QR code and decodes it, and shows the information encoded inside that QR code directly. Now we don't need to carry an extra app for decoding QR codes.

Another best use of computer vision is Object Detection, and that's why we can build a self-driving car these days. I would say that this is a form of blessing for us. If humans can't drive without losing focus on driving, AI can help us with that, but how would an AI know if it will crash somewhere. Of course, we don't just need to figure out where other cars are on the road. But we want more than that, and here is where Object Detection plays its role. It helps a car understand that whether something is coming ahead of it, and what's that, it may be a pedestrian, a tree, or even a traffic signal. But still, one thing lacks, and that is the position of those objects, and indeed there's another way that helps our car's AI figure it out instead of just detecting them. Also, notice that there are multiple objects at a time that are at a certain distance from a car, which is a challenge in autonomous cars. It was! Perhaps, I will talk about it in coming blog posts. But Now, Cheers! Because we can implement all these complex tasks in an autonomous car that we can use in the real world.

Now, this application is something that most of us use in our day-to-day life. It is called Neural Style Transfer. It helps us to apply the style of an image to another image. How often do you use that? You use that many times! We use it when we apply filters to our photos or making our image more artistic. We take a style image, from which we extract the style features and a content image, to which we apply those style features. And there's a neural network that will repaint the original image with the new style. Such algorithms are making us able to generate new kinds of artworks.

Although Computer Vision is an absolutely awesome branch in deep learning, memory usage is very high because the input images sometimes might get too big. Because, in Computer Vision, we look at each pixel's value to train a computer for some visual task. And you can imagine if an image is 128 by 128 by 3, the image has 49152 pixels, then what if the image is 1920 by 1080 by 3 (3 is in the multiplication because there are three channels, RGB) pixels? So there's always a tradeoff we have to pay. It costs a lot to train on such large-sized photos.

To do these tasks more efficiently, we use Convolutional Neural Networks, which we will discuss in coming posts.

There is even more to Computer Vision! We will discuss them in the coming blog posts. Make sure you keep in touch with our blog.