Our computer models are trained on a list of possible outputs (tags) to apply to any input (your content). Using machine learning, a process which enables a computer to learn from data and draw its own conclusions, our models are able to automatically identify the correct tags for any given image or video. These models are then made easily accessible through our simple API.
Our core model identifies 11,000+ general concepts like objects, ideas, and emotions. Depending on the level of specificity you need for your data, you may need additional models. For example, our core model can tell you if something is "food" but you'll need our Food model to tell you what kind of food, like "pizza."
Deep neural networks scale to billions of parameters giving them capacity to learn highly complex concepts and thousands of categories. With modern hardware and abundance of data we are able to train larger and more powerful networks.
A trained model stores its knowledge compactly in learned parameters, making it easy to deploy in any environment. There is no need to store any additional data to make predictions for new inputs. This means that we can easily use them on embedded devices and provide responses in milliseconds.
Unlike traditional computer vision approaches, our models learn to extract discriminative features from the input using the provided training data, instead of using hand-engineered feature extractors like SIFT and LBP. This makes them easy to adapt to problems in any domain.
In 2013, we took the top 5 winning spots in the image classification task at the ImageNet Large Scale Visual Recognition competition. Since then we have made further improvements in both accuracy and speed.
Our experts are also behind the winning entry (OverFeat) in the 2013 ImageNet localization task, allowing us to not only tell what objects are in the images but also where.