This tutorial shows you how to integrate Clarifai with Snowflake. Clarifai can automatically recognize, classify and label unstructured data streams in the form of images, video, text, and audio. You can now link Clarifai's predictive capabilities with recently released unstructured data support in Snowflake. Bringing these two platforms together allows you to analyze unstructured data and leverage powerful tools for managing data pipelines.
Clarifai provides the most complete, accurate, and user-friendly platform for working with and understanding unstructured data. In essence, this means that by using the most advanced tools in deep learning AI, Clarifai helps you transform unstructured data (the type of data that you typically see in the form of image, video, text, and audio data), into structured data - data that is organized in a way that is useful and meaningful to you.
But what can you do with your data once it is structured in the way you want? What if you want to expand your capabilities with a powerful suite of tools that can help you accelerate data analytics, improve data access, and construct data pipelines?
This is where integrating Clarifai with a platform like Snowflake can make so much sense. Combining the functionality of these two platforms can provide unprecedented insights and control over your data pipeline. We will be setting up this integration to trigger a call to the Clarifai API, and then format the response in a way that can be used within the Snowflake platform.
You will need to set up a call to an external function through AWS Lambda that will call the Clarifai API and then map it into Snowflake. AWS Lambda functions let you run code for applications and backend services. Snowflake offers in-depth instructions on how to do this here. In our case you will create layers over AWS Lambda, please reference this documentation for more information. To integrate with Clarifai you will need the following code and corresponding function.
Once we have registered the external function, we can call upon various Clarifai models and pass arguments to them. In this example, we are calling on the Clarifai General Model which can identify over 11,000 concepts in images, and then we call the Clarifai Named Entity Recognition (ner_english) model, which can identify key information from parts of the text.
In order to use these models, you will need to create an application and then add the desired model to a workflow. In this example, we create two different workflows and then add just one model to them (general, and ner_english respectively). You can learn more about working with workflows here.
The following code shows you how to create an external function, and includes a couple of example calls to Clarifai workflows.
Snowflake data pipelines automate many of the manual steps involved in transforming and optimizing continuous data loads. Typically, data is first loaded into a staging table used for interim storage and then transformed using a series of SQL statements before it is inserted into the destination reporting tables. The most efficient workflow for this process involves transforming only data that is new or modified.
Streams are typically used as part of this staging pipeline. A stream object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change so that actions can be taken using the changed data. Learn more about streams here.
In order to load these files to the directory table, you need to refresh it. When you do this the stream will automatically detect changes and update itself. Once this is configured, you can create a stream to get recent events.