- Notifications
You must be signed in to change notification settings - Fork3
License
aws-samples/open-source-bedrock-agent-evaluation
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Open Source Bedrock Agent Evaluation is an evalauation framework for Amazon Bedrock agent tool-use and chain-of-thought reasoning with observability dashboards in LangFuse.
https://github.com/awslabs/agent-evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.
Our repository provides the following additional features:
- Test your own Bedrock Agent with custom questions
- Provides the option for LLM-as-a-judge without ground truth reference
- Includes both Agent Goal metrics for chain of thought , and Task specific metrics with RAG, Text2SQL and custom tools
- Observability with integration with Langfuse that includes latency and cost information
- Dashboard comparison for comparison of agents with multiple Bedrock LLMs
- Clone this repo to a SageMaker notebook instance
- Clone this repo locally and set up AWS CLI credentials to your AWS account
Set up a LangFuse account using the cloudhttps://www.langfuse.com or the self-host option for AWShttps://github.com/aws-samples/deploy-langfuse-on-ecs-with-fargate/tree/main/langfuse-v3
Create an organization in Langfuse
Create a project within your Langfuse organization
Save your Langfuse project keys (Secret Key, Public Key, and Host) to use in config
If you are using the self-hosted option and want to see model costs then you must create a model definition in Langfuse for the LLM used by your agent, instructions can be found herehttps://langfuse.com/docs/model-usage-and-cost#custom-model-definitions
Create a SageMaker notebook instance in your AWS account
Open a terminal and navigate to the SageMaker/ folder within the instance
cd SageMaker/
- Clone this repository
git clone https://github.com/aws-samples/open-source-bedrock-agent-evaluation.git
- Navigate to the repository and install the necessary requirements
cd amazon-bedrock-agent-evaluation-framework/pip3 install -r requirements.txt
- Clone this repository
git clone https://github.com/aws-samples/open-source-bedrock-agent-evaluation.git
- Navigate to the repository and install the necessary requirements
cd open-source-bedrock-agent-evaluation/pip3 install -r requirements.txt
- Set up AWS CLI to access AWS account resources locallyhttps://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html
- Bring you own agent to evaluate
- Create sample agents from this repository and run evaluations
Bring your existing agent you want to evaluate (Currently RAG and Text2SQL evaluations built-in)
Create a dataset file for evaluations, manually or using the generator (Refer to the data_files/sample_data_file.json for the necessary format)
Copy the template configuration file and fill in the necessary information
cp config_tpl.env.tpl config.env
- Run driver.py to execute evaluation job against dataset
python3 driver.py
- Check your Langfuse project console to see the evaluation results!
Follow the instructions in theBlog Sample Agents README. This is a guided way to run the evaluation framework on pre-created Bedrock Agents.
SeeCONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
About
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors5
Uh oh!
There was an error while loading.Please reload this page.