October 12, 2017

Search Images By Visual Similarity with the Clarifai API

Table of Contents:

When you’re searching for images, words are often not enough to find exactly what you need. Wouldn’t it be amazing if you could just show your computer a picture and say, “Find me images that look like this?” With the Clarifai API, you can search for any image by visual similarity – here’s how!

The Clarifai Search API has a variety of different ways for you to query your inputs. In one of our previous posts, we talked about searching your content by geo location. In this post, we will have an image do all of the talking. You won’t need to do any training on your dataset; simply upload images and then you can search over those images seamlessly.

Getting Started

For the first step, you need to have Python (version 3.6.2) installed for your appropriate operating system. You can head over to a terminal and write in python --version. If you happen to have a version of Python 2 installed you may need to try python3 --version. You should see a prompt come up reading: Python 3.6.2. If neither of these work, you may need to check if Python is included in your PATH environment.

Many people who develop in Python suggest to have a “virtual environment”. These help to manage any application-specific dependencies. Having one won’t be necessary for this tutorial but if you are curious you can read more about them in the Python documentation.

Now, install the official Python Clarifai client using pip install clarifai. The last thing you’ll need to do is make sure to sign up for a free Clarifai account and

create an application.

Adding your inputs

You will need to have a dataset of inputs to search against. These inputs get indexed on your application and Clarifai will be able to “see” the image. To a machine, an image is nothing but vectors that describe what each pixel looks like. When you perform a search using an image, Clarifai will look to see how close these vectors are to one another to determine if the images are visually similar. The visual similarity can be more effective if your dataset includes objects that are relevant to what you’d like to search against. If you were to add a dataset full of only food and try to search using an image of a dog your search results wouldn’t be as strong.

Here’s a dataset of images from ImageNet on food. You should see a file containing nothing but lines of URLs of images of food. Save this file as food-data.txt in the same directory as your code then we will take this file and upload the images in batches.

# upload.py
import os
from clarifai.rest import ClarifaiApp
from clarifai.rest import Image as ClImage

app = ClarifaiApp(api_key='YOUR_API_KEY')

FILE_NAME = 'food-data.txt'
FILE_PATH = os.path.join(os.path.curdir, FILE_NAME)

# Counter variables
current_batch = 0
counter = 0
batch_size = 32

with open(FILE_PATH) as data_file:
    images = [url.strip() for url in data_file]
    row_count = len(images)
    print("Total number of images:", row_count)

while(counter < row_count):
    print("Processing batch: #", (current_batch+1))
    imageList = []

    for current_index in range(counter, counter+batch_size - 1):
        try:
            imageList.append(ClImage(url=images[current_index]))
        except IndexError:
            break

    app.inputs.bulk_create_images(imageList)

    counter = counter + batch_size
    current_batch = current_batch + 1

Wowza! Over one thousand images being uploaded effortlessly.

Jim Carey Drinking Coffee

Searching using an Image

Now the amazing part you’ve been waiting for is being able to search by visual similarity using an image. Let’s say you wanted to find out if there is anything similar to this picture of cookies. All you would need from here is:

# search.py
from clarifai.rest import ClarifaiApp

app = ClarifaiApp(api_key='YOUR_API_KEY')

# Search using a URL
search = app.inputs.search_by_image(url='https://images-gmi-pmc.edge-generalmills.com/cbc3bd78-8797-4ac9-ae98-feafbd36aab7.jpg')

for search_result in search:
    print("Score:", search_result.score, "| URL:", search_result.url)

The search_by_image() function returns a list of Image objects that wrap the response but we will have it print out the score and the url associated with it. You can also use image bytes or a filename to query against.

# Response
Score: 0.8486366 | URL: http://farm4.static.flickr.com/3502/4000853007_0f1e33cdc0.jpg
Score: 0.79205513 | URL: http://farm4.static.flickr.com/3213/3080197227_e4b28c76ae.jpg
Score: 0.7901007 | URL: http://farm1.static.flickr.com/222/470272746_1674448c07.jpg
Score: 0.741455 | URL: http://farm4.static.flickr.com/3317/3289848643_bf1f2e7b5b.jpg
Score: 0.7173992 | URL: http://farm4.static.flickr.com/3620/3473362088_c90b72c819.jpg
Score: 0.68365324 | URL: http://farm1.static.flickr.com/150/365771958_06e87421d1.jpg
Score: 0.6734046 | URL: http://farm4.static.flickr.com/3077/3160541712_b879bf7a22.jpg
Score: 0.6723133 | URL: http://farm4.static.flickr.com/3399/3185443954_26bf37dc8a.jpg
Score: 0.66024935 | URL: http://farm3.static.flickr.com/2481/3943873688_f094d211a3.jpg
Score: 0.6529919 | URL: http://farm3.static.flickr.com/2124/2239822705_419fffe609.jpg
Score: 0.6473093 | URL: http://farm2.static.flickr.com/1319/540399285_142ae1822e.jpg
Score: 0.63564813 | URL: http://farm3.static.flickr.com/2030/2310386017_8741472785.jpg
Score: 0.6230167 | URL: http://farm4.static.flickr.com/3338/3501581193_be17c2d04e.jpg
Score: 0.61543244 | URL: http://farm1.static.flickr.com/228/499181350_b01a280789.jpg
Score: 0.61172754 | URL: http://farm4.static.flickr.com/3169/2802528597_0483e7aa39.jpg
Score: 0.6057524 | URL: http://farm3.static.flickr.com/2111/2413962121_41b412c39c.jpg
Score: 0.60092676 | URL: http://farm1.static.flickr.com/118/296736486_e721b93e82.jpg
Score: 0.60023034 | URL: http://farm3.static.flickr.com/2264/2410275528_d7a69df963.jpg
Score: 0.5992471 | URL: http://farm3.static.flickr.com/2317/2435454915_2947203717.jpg
Score: 0.5968622 | URL: http://farm3.static.flickr.com/2664/3979835380_7748ddf164.jpg

The search response will show a value for score from 0 to 1. A score closer to 1 means the image is more visually similar; a score closer to 0 means the image is less visually similar. The response from Clarifai also defaults from the top 20 results. If you want to change, that you can add a different value for  the per_page parameter in the search_by_image() function.

Conclusion

You’re probably thinking to yourself, “That’s all it took?” and the answer is yes. The part that took the longest was getting the dataset. Otherwise, performing our search was only five lines of code. The Search Clarifai API doesn’t stop here:

If you have any questions, concerns, or even friendly notes feel free to reach out to us over at hackers@clarifai.com or comment below!