There has been a lot of fun poked at image recognition lately. Facial recognition can be fooled with mannequins or printouts. Someone can use photo editing tools to paste an image of an elephant into an image of a library with the result that the AI no longer recognizes the elephant or the books. Humans, of course would clearly recognize both. AI researchers can use "adversarial networks," basically their knowledge of the inner workings of the recognition system, to produce two images that look identical to the human eye but have subtle differences that “fool” the AI.
All this leads to people thinking the AI is stupid, but really the AI is picking up on something we don’t notice. When we measure AI success, we can’t measure it completely off of mimicking human recognition because there are many recognition tasks where the AI system is actually better than the average human. Once we get beyond using humans to train the AI systems, we won't be surprised when the AI provides better-than-human performance.
Take for instance Matt Zeiler, Clarifai’s Founder and CEO. During presentations, he likes to do a demonstration where he gives the AI system a few examples of his dog Rolly and then provides different images to test the system. After just a few samples, the "Rolly detector" is pretty good. It doesn’t confuse Rolly with a car or a cat, but sometimes it might identify other dogs as Rolly, especially dogs of the same breed.
After Matt provides even more example images with and without Rolly, those mistakes become quite rare. Of course, I don't think the "Rolly detector" can ever get better than Matt at recognizing his own dog, but at some point it gets better than you and me. You might say “well, I’ve never met Rolly,” but neither did the AI. It only saw pictures. Given the same set of images to learn from, humans do no better than AI at recognizing Rolly.
I've done demonstrations of a system I worked on which was trained to recognize famous pieces of artwork, celebrities and millions of other objects. I’ll show an image to the crowd and to the AI system at the same time and ask for the name of the painting or the celebrity. If I show the “Mona Lisa,” most in the crowd would tie with the AI. However, when I show a picture of “The Lady With The Ermine,” another Da Vinci masterpiece, the AI recognizes it immediately and far fewer people in the crowd are typically able to.
The same applies to heads of state. While the AI system can recognize over one hundred heads of state, in my experience, most people can't recognize the head of state of even ten different countries! No human on their own can recognize the world's "most famous" one thousand people and certainly they can't recognize the "most famous" hundred thousand people. The AI system, on the other hand, can easily recognize them all along with thousands of famous pieces of artwork and the millions of other objects it’s trained on. Clearly in terms of quantity, AI systems have already surpassed human capabilities, but there’s a way to make them even better.
A colleague of mine built a system to recognize the make, model and year of automobiles from their images focusing on the exterior design of the car. One image proved to be a problem for the system to identify. Why? Because it was trained by humans to look at what humans would look at: the design. Instead, by running the license plate number we were able to conclude it was a 2009 Nissan Sentra that was assembled in Mexico and not Japan and had a base trip and 2.0L engine. None of this information would be discernible by humans by just looking at a car. However, an AI system could be trained to identify license plate numbers and output this data at scale.
These three stories showcase three ways machine recognition can become better than human recognition:
- The Rolly detector shows an AI system can be trained by an expert and do better than non-expert humans.
- The famous person and art detector shows an AI system can be trained by many humans and thus do better than any single human could do on their own.
- The third system, however, shows the best way we can train AI to get better than humans: the system must be trained not just by humans but also by machines. In this case, making use of the license plate database to get more information than a human can see.
This third category is going to see a lot of growth very soon. For instance, radiologists used to study “gold standard” images where the truth was determined by a board of more experienced radiologists. We can now train far more images than any one radiologist could look at, and we can train with ground truth that is determined from patient outcomes and other tests beyond just the image. We can generate new images to train the AI system by taking "easy" images and blurring them. We humans might not be able to tell from the blurred image what the right answer is, but the AI will still learn from it.
There are researchers generating images with those adversarial networks used to fool the machines. These generative adversarial networks are called GANs and can create images which can be used to train AI to do a better job. Typically it requires several instances of an object to train an AI to recognize something new, but if we get the machines to start making the images to train the machines, recognition can quickly surpass human capabilities.
So what about that elephant in the room, or better, the one in the sky that the AI classified as an airplane? If I went outside and saw an elephant in the sky I'd be very surprised, but I would not think that elephants had learned to levitate. Maybe I would think it was a hot air balloon shaped like an elephant or an airplane painted to look like an elephant for some festival. Maybe that AI system that fails to recognize an elephant in the sky isn't stupid after all.