Workflows in Clarifai: A Tutorial

One of the most powerful features of Clarifai is the ability to combine machine learning models like they are nodes in a graph. This is done through workflows. With workflows, you can chain together multiple models to design a multimodal system.

This feature is going to make your life much easier, trust us. Read on to find out how.

But first…

What’s a multimodal system?

A multimodal system in AI refers to a system that can understand, process, and integrate information from multiple types of inputs or "modes". These modes can be text, voice, images, or videos. For example, a chatbot that can understand text messages and voice commands is a multimodal system.

Here’s a quick video on how you can use workflows to chain together multiple models and guide and direct model behavior.

Step 1: Set Up Your Application

Navigate to https://clarifai.com/explore and click on Create to start your application.

Provide it with a unique name.

Write a short description.

Choose an input type.

Select Create App

You don’t need to choose a Model. Now, you have an app that acts like a container where you can assemble your workflows..

Step 2: Create an Optical Character Recognizer (OCR) Workflow

Workflows have endless applications. Feel free to create a workflow using the models you like. For this blog, we are trying to read text from images and then translate.

Here’s how:

Navigate to and click on the Workflows on the left panel, and then click on Create Workflow in the upper right.

You’ll see a no-code, drag-and-drop interface for connecting models.

Scroll down until you see an optical character recognizer model. This model allows computers to extract text like a street sign from an image.

Next, look for a text-to-text model which transforms one form of text into another.

Draw connections between the models, defining the flow of information from one model to the subsequent one.

Click on each model to select the specific model to be used in each step of the workflow. For this example, we'll use the paddle OCR model, and for the text-to-text model, the English to Spanish translation model.

Once everything is connected correctly, save your workflow.

Now, test this workflow with sample images. The results should showcase the model's capability to read and translate text from images effectively. Hurray!

Step 3: Create an Automatic Speech Recognition (ASR) Workflow

In your same app, start a new workflow and look for an audio-to-text model.

Add and connect a text classifier model to the workflow.

Select the first model in the sequence and search the latest wave to vec English audio to text model.

For the text classifier, search for "sentiment" and select the Sentiment Analysis Distilbert model (again, the most recent version).

Save the workflow.

You can verify the efficiency of this workflow with pre-recorded audio samples. The results will demonstrate the workflow's ability to convert speech to text and then analyze sentiment.

Leverage Clarifai's workflows to craft multimodal systems by linking machine learning models like graph nodes.

Workflows in Clarifai: A Tutorial

Table of Contents: