March 8, 2022

Clarifai Release 8.2

Table of Contents:

Clarifai Release 8.2

The Clarifai platform has now been updated to Release 8.2! We've added even more incredible NLP models for text processing, generation, detection, and inference. 

Model:Text Summarization English Pegasus Model

Pegasus is a state-of-the-art model for abstractive text summarization. Abstractive text summarization is the creation of new text entirely summarizing another piece of text, as opposed to extractive text summarization which pulls specific sentences out to create the summary.

From Google's Blog:

 

Students are often tasked with reading a document and producing a summary (for example, a book report) to demonstrate both reading comprehension and writing ability. This abstractive text summarization is one of the most challenging tasks in natural language processing, involving understanding of long passages, information compression, and language generation. The dominant paradigm for training machine learning models to do this is sequence-to-sequence (seq2seq) learning, where a neural network learns to map input sequences to output sequences. While these seq2seq models were initially developed using recurrent neural networksTransformer encoder-decoder models have recently become favored as they are more effective at modeling the dependencies present in the long sequences encountered in summarization.

Pegasus

Explore Clarifai Community

Discover, build and share AI models and workflows with community members. Browse 1000s of pre-built models from Clarifai and other leading AI providers.

Start browsing today

home-clarifai-community-hero-mobile

Model: News Summarization Russian MBart

News summarization trained with gazeta.ru news stories.  May be useful for customers in Media & Internet and Publishing, especially news organizations. Applications may be relevant for Content Moderation and may aid with Sentiment Analysis.

For more implementation details, please see the Colab link or Dataset for Automatic Summarization of Russian New.

Model: Paddle OCR Models of 28 Languages

Paddle OCR is an awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

For more information see the GitHub page and the source paper. May be useful for customers in Banking, Financial Services, (P&C) Insurance, and Business Services (Transactional Finance, Invoice Capture, Process Claims). See more popular use cases by industry here.

 

Model: MT5 Spanish Text Summarization

Summarization is the task of condensing a piece of text to a shorter version, reducing the size of the initial text while at the same time preserving key informational elements and the meaning of content. Since manual text summarization is a time expensive and generally laborious task, the automatization of the task is gaining increasing popularity and therefore constitutes a strong motivation for academic research. This model was trained on the Spanish section of MLSum: https://paperswithcode.com/sota/abstractive-text-summarization-on-mlsum based on mt5-small.

 

For more information see this page.