This post is a transcription of our recent tutorial video for people who just don't enjoy sitting through videos :)
One of the greatest features of Clarifai Community is the ability to use models as building blocks. I'm going to demonstrate here how I can create AI workflows by connecting 2 models together to create a multimodal system, that is, one that combines different types of media like images with text.
First, I create the application. This is a container for all our related models and workflows for this particular project. I give it a unique name, a short description, the language, and a default workflow.
Boom, application created.
This is what the app looks like when it's empty!
To create a new workflow, go to workflows, then create workflow.
Here we see the no code, drag and drop interface to connect models together! Let's search for an optical character recognizer, which is how computers can extract text from an image, whether it be a scan of a printed page or a photo of street signs.
Now we search for a second model, a text-to-text model, which transforms one kind of text into another. Now we draw in the connections that show the flow of information from one model to another.
Next we'll specify which model we'll use for optical character recognition, the PaddleOCR model.
Now we choose the text-to-text model by searching for the word Spanish, and use an English to Spanish translation model. And we're done!
Let's save this workflow and grab a few examples, as I've saved URLs of some demo images that we can try the workflow out with.
The models take a moment to run the first time, but after that they're in memory and run quicker. As you can see, it's correctly read the text on the sign in the image, and translated it into Spanish! iTambién podemos vender el tuyo!
Taking a look at other images, it's identified all the text and translated it, including a graphic with text, text printed on hanging flags, and a bilingual poster where it left the Spanish text unchanged.
We can even save our workflow, and use it in another application.
Another awesome multimodal workflow we can create is converting speech to text, then analyzing if contains positive or negative sentiment. Once again, I'll use the same app we just created to add the workflow. I'm renaming it to ASR sentiment, which is short for automatic speech recognition sentiment analysis. I grab an audio-to-text model, connect the wiring, and then drag in and connect a text classifier.
I then select the first model in the chain and search for an English audio-to-text model, then pick the Wav2vec model and choose the most recent version.
Then I click the text classifier model, search for sentiment, and open up the full choices from community to choose the sentiment-analysis Distilbert model, and again select the most recent version.
I save it, and again I'll bring in some examples I'd previously set up to test this workflow.
Let's take a listen to the first one. A good morning and a great presentation is definitely a positive statement, and the prediction shows that it is!
And this concludes this tutorial! Two multimodal workflows created in just a few minutes, and that's the power of Clarifai Community.