If you’ve already heard of computer vision (CV), then you’re probably familiar with what it does. Powered by CV models, this technology gives computers the ability to see and understand images. We’ve already discussed what different forms of CV focus on, so below we’ll be looking at the different techniques used in these applications and how that will impact your results: Classification, Detection, and Segmentation.
Before we get started though, let’s first go over two basic definitions that will help you as you learn about each technique:
- Labeling: Labeling refers to us taking a dataset of unlabeled images and adding meaningful, informative tags telling you what is in that image.
- Bounding boxes: Bounding boxes are a type of labeling where a box can be created, edited and deleted, and the box is assigned concepts.
So, what is Classification?
Classification refers to a type of labeling where an image/video is assigned certain concepts, with the goal of answering the question, “What is in this image/video?”
An image can be classified into a number of categories. For example, the below screenshot, taken from one of our model demos, shows an image that has been uploaded to Clarifai’s General Model.
As we can see, the model has given us a list of predicted concepts. These represent how the model has classified the image with each concept representing a different “classification.”
This technique is useful if you just need to identify general things like “Is this a beach or is it a pool?”
Okay, so what is Object Detection?
Object detection is a computer vision technique that deals with distinguishing between objects in an image or video. While it is related to classification, it is more specific in what it identifies, applying classification to distinct objects in an image/video and using bounding boxes to tells us where each object is in an image/video. Face detection is one form of object detection.
This technique is useful if you need to identify particular objects in a scene, like the cars parked on a street, versus the whole image.
Below, we can see the difference between classification and object detection:
Figure 1: Classification
Figure 2: Object Detection (Face Detection Model)
As we can see, while figure 1 tells us all the things it sees in the image, in figure 2, only the faces are detected and isolated with a bounding box.
So then what is Segmentation?
Segmentation is a type of labeling where each pixel in an image is labeled with given concepts. Here, whole images are divided into pixel groupings which can then be labeled and classified, with the goal of simplifying an image or changing how an image is presented to the model, to make it easier to analyze.
Segmentation models provide the exact outline of the object within an image. That is, pixel by pixel details are provided for a given object, as opposed to Classification models, where the model identifies what is in an image, and Detection models, which places a bounding box around specific objects.
It is related to both image classification and object detection, as both these techniques must take place before segmentation can begin. After the object in question is isolated with a bounding box, you can then go in and do a pixel-by-pixel outline of that object in the image.
Segmentation is particularly useful if you need to ignore the background of an image, like if you want a model to identify and tag a shirt in a fashion editorial image taken on a busy street.
While human beings have always been able to do all the above in the blink of an eye, it’s taken many years of research, trial, and error to allow computers to emulate us. Nevertheless, today, thanks to computer vision, our devices are finally catching up to our needs.