RAG Evaluator is a Python library for evaluating Retrieval-Augmented Generation (RAG) systems. It provides various metrics to evaluate the quality of generated text against reference text.
You can install the library using pip:
pip install rag-evaluator
Here's how to use the RAG Evaluator library:
fromrag_evaluatorimportRAGEvaluator# Initialize the evaluatorevaluator=RAGEvaluator()# Input dataquestion="What are the causes of climate change?"response="Climate change is caused by human activities."reference="Human activities such as burning fossil fuels cause climate change."# Evaluate the responsemetrics=evaluator.evaluate_all(question,response,reference)# Print the resultsprint(metrics)
To run the web app:
- cd into streamlit app folder.
- Create a virtual env
- Activate
- Install all dependencies
- and run
The following metrics are provided by the library:
- BLEU: Measures the overlap between the generated output and reference text based on n-grams.
- ROUGE-1: Measures the overlap of unigrams between the generated output and reference text.
- BERT Score: Evaluates the semantic similarity between the generated output and reference text using BERT embeddings.
- Perplexity: Measures how well a language model predicts the text.
- Diversity: Measures the uniqueness of bigrams in the generated output.
- Racial Bias: Detects the presence of biased language in the generated output.
To run the tests, use the following command:
python -m unittest discover -s rag_evaluator -p "test_*.py"