What is visual recognition?

Visual recognition is the ability to automatically understand any image or video based on visual elements and patterns. Clarifai applies machine learning to image & video recognition, helping customers understand and manage their media.

How does Clarifai's visual recognition API work?

Our computer models are trained on a list of possible outputs (tags) to apply to any input (your content). Using machine learning, a process which enables a computer to learn from data and draw its own conclusions, our models are able to automatically identify the correct tags for any given image or video. These models are then made easily accessible through our simple API.

Our core model identifies 11,000+ general concepts like objects, ideas, and emotions. Depending on the level of specificity you need for your data, you may need additional models. For example, our core model can tell you if something is "food" but you'll need our Food model to tell you what kind of food, like "pizza."

Convolutional Neural Networks

Benefits of the deep learning approach


Deep neural networks scale to billions of parameters giving them capacity to learn highly complex concepts and thousands of categories. With modern hardware and abundance of data we are able to train larger and more powerful networks.


A trained model stores its knowledge compactly in learned parameters, making it easy to deploy in any environment. There is no need to store any additional data to make predictions for new inputs. This means that we can easily use them on embedded devices and provide responses in milliseconds.


Unlike traditional computer vision approaches, our models learn to extract discriminative features from the input using the provided training data, instead of using hand-engineered feature extractors like SIFT and LBP. This makes them easy to adapt to problems in any domain.

Clarifai is at the forefront of the deep learning revolution

Pushing the state of the art in large scale object recognition

In 2013, we took the top 5 winning spots in the image classification task at the ImageNet Large Scale Visual Recognition competition. Since then we have made further improvements in both accuracy and speed.

Leading the way in object localization

Our experts are also behind the winning entry (OverFeat) in the 2013 ImageNet localization task, allowing us to not only tell what objects are in the images but also where.

ImageNet Classification Error Rates (smaller is better)
  • Clarifai: 2013 Advances (10x faster, 5x less memory)
  • Clarifai: ImageNet 2013 Winning Entry
  • 2012 ImageNet Winners
  • 2012 Traditional Computer Vision Methods