We’ve already gone over what computer vision is and the many ways it is impacting businesses today. But how does it work? Here’s a quick rundown of the “brains” behind this type of artificial intelligence.
1) What is a computer vision model?
A computer vision (CV) model is a processing block that takes uploaded inputs, like images or videos, and predicts or returns pre-learned concepts or labels.
Examples of this technology include image recognition, visual recognition, and facial recognition.
2) So what can models see?
Models can be trained to see almost anything humans can see. Our General Model, for instance, can recognize 11,000+ concepts. We also have more focused pre-trained models that recognize concepts related to specific things like “weddings” or “travel.” Different models may predict different concepts for the same inputs based on their training.
In addition to being pre-trained, models can also be custom trained to see niche concepts that are unique to a person, business, or project.
3) Wait, custom trained?
That’s right. You can use your own data, and teach the model to see and recognize what you want, whether that’s the perfect slice of toast or sign language. The Spooky or Not model we created for Halloween is an example of a custom model. It was trained on data that was specific to Halloween, so it could learn to whether an image was “spooky” or not. To do this, the model had to be given images that showed “spooky” concepts (or “positive” examples of the desired concept,) such as scary dolls, and images that showed the opposite, like an American Girl doll (or a “negative” example of the desired concept.)
Custom models are built on top of pre-trained models, called base workflows, which act as a foundation on which the new model can learn. Think of an English-speaking adult learning a new language. He or she will already have the base knowledge, like knowing what a chair is, on which to build their language skills. Building a custom model would be similar to that person only needing to learn a new word for chair versus what a chair even is.
An example of this is PopSugar’s Twinning app being built on top of Clarifai’s Celebrity model, which was trained to recognize famous faces. Most custom models, however, can just use our General model.
4) How do I build my own?
Thanks to our application programming interface (API), creating your own model is actually pretty easy. To gain access to our API, you can sign up for a Clarifai account and get your free API key. After a few more quick steps, you’ll be ready to get started.
Step 1: Select a base workflow model from one of our 11 pre-trained models.
Step 2: Upload images and/or videos that show the concepts you want your model to learn.
Step 3: Assign labels to your images or videos, telling the model what each input is showing.
Step 4: Click the “Train Model” button. When the model is finished training, the status will change to “Model trained successfully.”
You can read more about this process or watch our Senior Developer Evangelist, Skip Everling, in action here.
Things to remember
1) The more examples your model gets, the better it learns. That being said, your model won’t need too many inputs to start learning. You can start with as little as 10 inputs and just add more, as needed.
2) When you upload an input to a model, your model is seeing this input. The concepts it returns or predicts is the model telling you what it sees. What those concepts are is dependent on how you have labelled your images.
For instance, if you want your model to learn to recognize hammers, you need to:
- Upload inputs: Give it a few examples of what a hammer looks like and doesn’t look like.
- Train it: Tell it these examples show what a hammer looks like or doesn’t look like, depending on the input.
When next you upload an image or video, the model will be able to tell you if it is a hammer or not.
Giving your model both positive (e.g. pictures of hammers) and negative examples (pictures of screwdrivers) of your concepts is critical to your model success.
Take the following tweet as an example
I generally think of myself as an okay father but somehow I forgot to teach my two year old son what an owl was and he thought it was called a wood penguin— Gevalt-left (@crookedroads770) 10 June 2018
Models are a lot like the two-year-old, with the “owl” being an input. Without those “negative” examples, to teach it “this is not what a penguin looks like,” it will call the owl a “penguin.” Computer vision models aren’t as smart as humans though. Unlike the toddler, it wouldn’t be able to use its “base workflow” (like the two-year-old recognizing the difference in habitat) to come up with a new label for the owl (“wood penguin”.)
That said, it will see that this “penguin” looks different from the “positive” examples of penguins it was trained on, and so, give it a lower probability score, indicating that while it thinks this too is a penguin, something isn’t right.
And there we have it, a quick guide to computer vision models and building your very own. Want to learn more? Feel free to contact us any time. We’re always here to help!