What Is Image Recognition? A Two-Minute Rundown

By Natalie Fletcher

Computer vision might give computers the ability to see, but what each form actually focuses on and does isn’t always the same. In this post, I break down what image recognition is and how it differs from other forms of CV AI. 

What is Image recognition?

Image recognition refers to computer vision’s (CV) ability to identify the dominant subject in an image, and apply the relevant “concept” or “tag.” This could be objects, places, people, words and even actions.

Simple Definition: Where an image contains multiple objects, image recognition picks out the central object or what the camera was actually focusing on.


In the below picture, while we can see buildings, snow on the ground, and a sunset, the dominant object in the image is a train station. As such, all the concepts predicted relate to trains.

How is image recognition different from...

1) Visual recognition?

While image recognition can only recognize the dominant object in an image, visual recognition can do this for both image and video and apply the relevant concepts. Both image and video recognition sit at the heart of Clarifai’s computer vision technology.

Simple Definitions: Visual recognition is like image recognition but for images and videos.


In this video, the dominant image changes. For instance, in the beginning, the scene takes place in a room with a carpet and the puppy is playing with a human and a piece of tissue. Despite all these objects and subjects, however, the technology recognizes that the puppy is the main focus. This is reflected in the predicted tag.



2)  Object Detection?

Object detection is a computer vision technique for detecting many different objects in images or videos, instead of just a single dominant object, using bounding boxes or a simple rectangle or square around an image.

Simple Definition: Object detection uses bounding boxes to identify several different objects are in an image.


In the below image, Clarifai’s Celebrity Model put a bounding box around the faces of cast members from the Emmy-awarding sitcom “Modern Family.” It then shows its predictions for each face (though not in order.)   


3)  Object Tracking?

Object tracking refers to the process of following a specific object of interest among multiple objects in a given video. It traditionally has applications in video and real-world interactions where observations are made following an initial object detection.

Simple Definition: Object tracking uses bounding boxes to identify several objects in a video and follows them.


Below, we see that a bounding box is put around each player during this 2006 FIFA World Cup match, following them as they move.


