RAG
Check outthis short video from the Elastic Snackable Series.
Retrieval Augmented Generation (RAG) is a technique for improving language model responses by grounding the model with additional, verifiable sources of information. It works by first retrieving relevant context from an external datastore, which is then added to the model’s context window.
RAG is a form ofin-context learning, where the model learns from information provided at inference time. Compared to fine-tuning or continuous pre-training, RAG can be implemented more quickly and cheaply, and offers several advantages.
RAG sits at the intersection ofinformation retrieval and generative AI. Elasticsearch is an excellent tool for implementing RAG, because it offers various retrieval capabilities, such as full-text search, vector search, and hybrid search, as well as other tools like filtering, aggregations, and security features.
Implementing RAG with Elasticsearch has several advantages:
- Improved context: Enables grounding the language model with additional, up-to-date, and/or private data.
- Reduced hallucination: Helps minimize factual errors by enabling models to cite authoritative sources.
- Cost efficiency: Requires less maintenance compared to fine-tuning or continuously pre-training models.
- Built-in security: Controls data access by leveraging Elasticsearch'suser authorization features, such as role-based access control and field/document-level security.
- Simplified response parsing: Eliminates the need for custom parsing logic by letting the language model handle parsing Elasticsearch responses and formatting the retrieved context.
- Flexible implementation: Works with basicfull-text search, and can be gradually updated to add more advanced and computationally intensivesemantic search capabilities.
The following diagram illustrates a simple RAG system using Elasticsearch.