The Clarifai platform has now been updated to Release 8.0! We've added some incredible models for both image and video captioning, as well as handwritten optical character recognition. and added new state-of-the-art models that you can start using today.
A shortcoming of the current open-source OCR libraries such as EasyOCR and PaddleOCR is that you need to know the language beforehand. However, EasyOCR supports some specific combinations of languages (e.g. Japanese and English) but it doesn’t allow any arbitrary combination of languages (e.g. English, Japanese, Arabic).
Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. This model proposes an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads. This method outperforms the single-head model with a similar number of parameters in end-to-end recognition tasks and achieves state-of-the-art results on MLT17 and MLT19 joint text detection and script identification benchmarks.
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
We’re implementing OpenAI's neural network called CLIP, which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3.
These models are used as part of CLIP.
Microsoft's TrOCR, an encoder-decoder model consisting of an image Transformer encoder and a text Transformer decoder for state-of-the-art optical character recognition (OCR) on single-text line images. This particular model is fine-tuned on IAM, a dataset of annotated handwritten images.