Clarifai Release 8.4

By Ian Kelk

Clarifai Release 8.4

This month we've added new models for caption generation, printed OCR, grammar correction, and logo detection!

 

Model: Logo Detection with YOLOv5 (logos-yolov5)

Find logos in images and video with this model! It was trained on the combination of LogoDet3k, OpenLogos, and Clarifai's internal logo dataset for a total of nearly 3,500 logos that can be detected. 

 

Try it here!


 

Model: Image Captioning with BLIP (general-english-image-caption-blip)

BLIP is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. This is a model from Salesforce that generates image captions; for a full deep dive check out their blog post or the research paper.

 

Try it here!

Model: Image Recognition with Vision Transformer (general-image-recognition-vit)

The vision transformer is a great all-purpose solution for most visual recognition needs with industry-leading performance.

 

Try it here!

Model: Optical Character Recognition with Transformer OCR (ocr-document-english-printed-trocr-large)

Optical Character Recognition model based on Microsoft's Transformer OCR for recognizing printed texts. The TrOCR model was fine-tuned on the SROIE dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository.

The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa.

Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens.

 

Try it here!

Model: English Grammar Correction with T5 (english-grammar-correction-t5)

Human and machine generated text often suffer from grammatical and/or typographical errors. It can be spelling, punctuation, grammatical or word choice errors. Gramformer is a library that exposes 3 separate interfaces to a family of algorithms to detect, highlight and correct grammar errors. To make sure the corrections and highlights recommended are of high quality, it comes with a quality estimator. You can use Gramformer in one or more areas mentioned under the "use-cases" section below or any other use case as you see fit. Gramformer stands on the shoulders of giants, it combines some of the top notch researches in grammar correction. 

 

Try it here!

video-scene-detection-car

Explore The Clarifai Community

Discover, build and share AI models, workflows and app components; powered by our low code, no code platform. Browse pre-built models from Hugging Face, Facebook, Google, Clarifai, and more.

Start browsing today

community-building-blocks-thumbnail

Clarifai 8.4 with new models for caption generation, printed OCR, grammar correction, and logo detection.

Subscribe to updates

Recent Posts