3. Orchestration, RAG
Contents
3. Orchestration, RAG¶
Summary¶
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of Large Language Models (LLMs) by integrating an information retrieval component with the model’s text generation. This approach allows LLMs to access and utilize external, up-to-date data sources, thereby improving the accuracy and relevance of their responses. The RAG architecture includes an orchestration layer that manages the workflow, retrieval tools that fetch contextually relevant data, and the LLM itself, which generates responses based on the augmented prompt.
Key Concepts¶
Orchestration Layer : This layer manages the overall workflow of the RAG system, receiving user input, interacting with various components, and orchestrating the flow of information between them.
Retrieval Tools : These utilities provide relevant context for responding to user prompts, including knowledge bases for static information and API-based retrieval systems for dynamic data sources.
LLM : The Large Language Model is responsible for generating responses to user prompts, utilizing the context provided by the retrieval tools.
Knowledge Base Retrieval : Involves querying a vector store, a database optimized for textual similarity searches, to retrieve relevant information.
API-based Retrieval : Used to fetch contextually relevant data in real-time from data sources that allow programmatic access.
Prompting with RAG : Involves creating prompt templates with placeholders for user requests, system instructions, historical context, and retrieved context, which are filled by the orchestration layer before passing the prompt to the LLM.