Retrieval-Augmented Generation (RAG) is a process designed to enhance user queries by enriching search terms and queries to achieve more precise results from the search engine. This ensures that the best and most relevant content is made available to the large language model, enabling it to generate accurate and contextually informed responses. Zammo has this functionality built in to our product, making it extremely easy for you to launch a RAG chatbot or IVR bot very quickly.

RAG overview.jpg

The process begins by taking the user’s request, along with the previous chat history and knowledge base context, and passing it to the large language model (in this case, Azure OpenAI). The model uses this information to structure and optimize the query, making it more effective for retrieval. This optimized query is then sent to the search engine (Azure AI Search) to locate the most relevant content.

Once the search results are retrieved, the large language model is prompted again. It incorporates the relevant search results, the query, the conversation history, and any available context to generate a highly informed and relevant response for the user.

Zammo’s built-in RAG capabilities simplify the complexity of managing multiple steps in this process while maintaining the flexibility to customize settings. This allows users to tailor the system to their specific needs without compromising functionality or accuracy.