With the rapid adoption of AI and the innovation that is happening around us, we need the ability to take large amounts of data, contextualize it, and enable it to be searched with meaning.
This is where embeddings come into place which are the vector representations of data generated by machine learning models such as Large Language Models (LLMs). Vectors are mathematical representations of objects or data points in a multi-dimensional space, where each dimension corresponds to a specific feature or attribute.
In the context of machine learning, these features represent different dimensions of the data that are essential for understanding patterns, relationships, and underlying structures.
Managing all these representations is challenging and this is ultimately where the strength and power of a vector database lies. It’s ability to store and retrieve large volumes of data as vectors, in a multi-dimensional space.
This opens up a lot of use cases such as Semantic Search, Multimodal Search, and Retrieval Augmented Generation (RAG).
Large Language Models have their own limitations. They are not up to date, as they have only trained on data for a certain time period. For example, GPT-4 has the knowledge cutoff of April 2023; if you ask questions that are outside of their training data, they will either state they don't know and cite their training cutoff, or they might hallucinate plausible answers. Also, LLMs are trained for generalized tasks and lack domain-specific knowledge such as your own data.
Imagine you're reading a scientific article and you've just come across a term you're not familiar with. Naturally, you'd look it up on Wikipedia or search online to find out what it is, and then use that knowledge to continue your reading. RAG works in a similar fashion for LLMs when they're presented with topics or questions they haven't been trained on.
Here's how it works, step-by-step:
This RAG process is particularly useful in situations where being up-to-date is necessary—say, providing the latest information in a rapidly changing field like technology or current affairs. It empowers the LLM to fetch and use the most recent and relevant information beyond its original training data. Compared to building your own foundation model or fine-tuning an existing model for context-specific issues, RAG is cost-effective and easier to implement.
The 3 components for building a RAG system are the Embedding Models, LLMs, and a Vector Database. Clarifai provides all three in a single platform to seamlessly allow you to build RAG systems. Checkout this notebook to build RAG for Generative Q&A using Clarifai.
Semantic search uses vectors to search and retrieve text, images and videos. Compared to traditional keyword search, vector search yields more relevant results and executes faster. In a keyword search, the search engine uses specific keywords or phrases to match against the text data in a document or image metadata. This approach relies on exact matches between the search query and the data being searched, which can be limiting in terms of finding visually similar content.
One of the key advantages of semantic search is its ability to search for similar images or videos, even when the search terms themselves are not exact matches. This can be especially useful when searching for highly specific unstructured data, such as a particular product or location.
Clarifai offers vector search capabilities that support text-to-text, image-to-image, and other modalities as long as they are embeddings. For visual search, you can access this feature in the Portal Grid View, where searching for one input using visual search will return similar inputs with decreasing similarity based on visual cues and features.
Multimodal search is a specific case of semantic search. Multimodal search is an emerging frontier in the world of information retrieval and data science. It represents a paradigm shift from traditional search methods, allowing users to query across diverse data types, such as text, images, audio, and video. It breaks down the barriers between different data modalities, offering a more holistic and intuitive search experience.
A popular application of multimodal search is text-to-image search, where natural language is used as a prompt to form the query and search over a collection of images.
Clarifai offers Smart Caption Search which lets you rank, sort, and retrieve images based on text queries. Smart Caption Search transforms your human-generated sentences or thoughts into powerful search queries across your inputs. Simply input a descriptive text that best describes the images you want to search for, and the most relevant matches associated with that query will be displayed.
Performing searches using full texts allow you to provide a much more in-depth context and retrieve more relevant results as compared to other types of searches.
Vector Databases are incredibly powerful for efficiently managing vector embeddings and extending the capabilities of LLMs. In this article, we learned about applications around vector databases, such as RAG, Semantic Search, and Multimodal Search, as well as how you can leverage them with Clarifai. Checkout this blog to learn more about Clarifai's vector database.