New
Clarifai is recognized as a Leader in The Forrester Wave™: Computer Vision Tools, Q1 2024
May 3, 2022

Clarifai Release 8.4

Table of Contents:

Clarifai Release 8.4

This month we've added new models for caption generation, printed OCR, grammar correction, and logo detection!

 

Model: Logo Detection with YOLOv5 (logos-yolov5)

Find logos in images and video with this model! It was trained on the combination of LogoDet3k, OpenLogos, and Clarifai's internal logo dataset for a total of nearly 3,500 logos that can be detected. 

 

Try it here!


YOLO Logo Detection Model

 

Model: Image Captioning with BLIP (general-english-image-caption-blip)

BLIP is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. This is a model from Salesforce that generates image captions; for a full deep dive check out their blog post or the research paper.

 

Try it here!

Flying Cat

Model: Image Recognition with Vision Transformer (general-image-recognition-vit)

The vision transformer is a great all-purpose solution for most visual recognition needs with industry-leading performance.

 

Try it here!

General Image Recognition Model

 

Explore Clarifai Community

Discover, build and share AI models, workflows and app components. Browse 100s of pre-trained models from Clarifai and leading AI builders.

Start browsing today

home-clarifai-community-hero

Model: Optical Character Recognition with Transformer OCR (ocr-document-english-printed-trocr-large)

Optical Character Recognition model based on Microsoft's Transformer OCR for recognizing printed texts. The TrOCR model was fine-tuned on the SROIE dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository.

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa.

Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens.

 

Try it here!

California license plate

Model: English Grammar Correction with T5 (english-grammar-correction-t5)

Human and machine generated text often suffer from grammatical and/or typographical errors. It can be spelling, punctuation, grammatical or word choice errors. Gramformer is a library that exposes 3 separate interfaces to a family of algorithms to detect, highlight and correct grammar errors. To make sure the corrections and highlights recommended are of high quality, it comes with a quality estimator. You can use Gramformer in one or more areas mentioned under the "use-cases" section below or any other use case as you see fit. Gramformer stands on the shoulders of giants, it combines some of the top notch researches in grammar correction. 

 

Try it here!

Grammar correction model