Retrival-Augmented Generation (RAG): the synergy of memory and creativity
One of the main challenges in implementing use cases based on textual generative AI is to provide it with external information. RAG techniques solve this problem by giving language models the ability to search for answers in external documentation on which they have not been trained.
What is RAG?
It consists of enriching the information available in a language model
Although it does not have a nice translation in English, it is something like "improved generation by compilation". It consists of enriching the information available to a language model so that it can respond using additional knowledge it has never seen before, such as private financial data or internal company sales pitches. Not only this, but thanks to the advance of open source models it is possible to do it completely on-premise, so that all information can remain confidential.
The idea behind RAG is, given a user's prompt to the AI, to automatically search within a body of knowledge for which text fragments most closely resemble the question the user has asked, and use that collected information to construct a new prompt, which includes both the original question and additional relevant context extracted from the knowledge base. Finally, with this new prompt, a language model generates a response.
In this way, it is possible that this language model is able to respond by relying on additional information to that used for its training, such as PDF documents, emails, databases or websites.
How does RAG work?
There are different architectures, but the most basic and common one consists of three stages:
- Index creation: before queries can be made, it is necessary to have the information pre-processed in a format that allows it to be searched at high speed. To do this, the texts are broken into smaller pieces called chunks. Each of these chunks is passed through an embedding model, which transforms that chunk of text into a numeric vector that ideally contains the information that the chunk is about. That is, text fragments that are semantically very similar, given a good embedding model, should generate vectors that are very similar, even if the same meaning is expressed with a completely different vocabulary. Conversely, a pair of unrelated text fragments should generate very different vectors. These embedding models are the heart of RAG, and make it more effective than a simple keyword search, as the same idea can be expressed with many different words, but also the same word can mean things that have absolutely nothing to do with each other.
- Search: given a query to the system by a user, a vector search is performed to return those text fragments that most closely match the query. This stage is critical for proper functioning and there are dozens of techniques to make it work better. For example, combine this search together with a keyword search (using BM25 and combining results using RRF) or create hypothetical answers and to do the search as they will semantically more closely resemble the chunks than the original question (Hypothetical Document Embeddings, HyDEs). There are so many possibilities that this alone is a topic for a whole article, so they will not be detailed here. But these techniques should not be underestimated: getting this stage right is the most important point of the whole system.
- Response generation: finally, both the original question and the text fragments collected in the previous stage are used to generate an answer.
The whole system works just like copying and pasting the relevant text into chatGPT along with the question to be asked, with the great advantage of not having to read and search for these context snippets manually.
Advantages of the RAG
The RAG provides access to vast knowledge resources stored in databases
- Up-to-date knowledge: pure text generation models, while impressive in their ability to mimic human language, may not be up to date or lack information on specific topics. RAG allows the model to complement its creative capabilities with up-to-date and specific external data, resulting in more relevant and accurate answers rather than vague and generic ones.
- Expanded memory: instead of relying solely on the model's internal memory, the RAG allows access to vast knowledge resources stored in databases, disorganised text documents, emails... This allows the system to have an "infinite memory" in practice, where information is not limited to just the data set on which the model was trained.
- Adaptability: RAG can customise its responses based on the information it retrieves. This makes it ideal for tasks such as personalised assistance, searching for technical information, generating specific content for different industries, among others.
- Hallucination reduction: one of the common problems in generative models is the tendency to produce responses that appear truthful but lack a basis in fact, known as "hallucinations". By incorporating a layer of verified data recovery, RAG significantly reduces the occurrence of these incorrect answers.
RAG applications
RAG models can help practitioners access up-to-date research
RAG applications are wide and varied, spanning multiple sectors and use cases. Some of the highlights include:
- Customer support systems: with RAG, companies can provide their customers with more detailed and accurate answers to their queries, accessing manuals, product documentation or specific knowledge bases, improving customer satisfaction. Probably the current No. 1 use case on RAG.
- Advanced specialised information search engines: for example, in the medical field, RAG models can help professionals access up-to-date research or medical articles when faced with complex clinical cases. Journalists can use RAG-based systems to access historical data or reports in real time, allowing them to generate informative and accurate articles on recent events or ongoing investigations.
- Generation of specialised content: whether in technical fields such as engineering, economics, science or simply answering an email, RAG allows you to create content that is not only consistent, but also backed by up-to-date and reliable data.
Challenges and future of RAG
Although RAG represents a significant advance in text generation, it still faces certain challenges. The correct selection of data sources is crucial to avoid the dissemination of erroneous or biased information. In addition, the efficiency of information retrieval is a determining factor in the quality of responses.
Retrieval-Augmented Generation is a step towards the creation of AI systems
A lot of good decisions need to be made during system design, and a robust testing system needs to be in place to ensure that each change to the system improves overall performance. That is why it is a good idea to develop RAG systems with an expert partner that saves time, unnecessary rework and headaches like Izertis.
The Retrieval-Augmented Generation is a step towards creating AI systems that are not only capable of generating coherent and fluent text, but can also access and use external information in real time to provide accurate and relevant responses. With its ability to combine creativity with knowledge, RAG has the potential to transform entire industries, from customer service to scientific research, and mark a new era in human-machine interaction.