aws-samples/open-source-bedrock-agent-evaluationPublic

generated fromamazon-archives/__template_MIT-0

NotificationsYou must be signed in to change notification settings
Fork3
Star21

License

MIT-0 license

21 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
blog_sample_agents		blog_sample_agents
data_files		data_files
evaluators		evaluators
helpers		helpers
img		img
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.env.tpl		config.env.tpl
driver.py		driver.py
requirements.txt		requirements.txt

Repository files navigation

Open Source Bedrock Agent Evaluation

Open Source Bedrock Agent Evaluation is an evalauation framework for Amazon Bedrock agent tool-use and chain-of-thought reasoning with observability dashboards in LangFuse.

Existing AWS assets

https://github.com/awslabs/agent-evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.

Our repository provides the following additional features:

Features

Test your own Bedrock Agent with custom questions
Provides the option for LLM-as-a-judge without ground truth reference
Includes both Agent Goal metrics for chain of thought , and Task specific metrics with RAG, Text2SQL and custom tools
Observability with integration with Langfuse that includes latency and cost information
Dashboard comparison for comparison of agents with multiple Bedrock LLMs

Evaluation Workflow

Evaluation Results in Langfuse

Dashboard

Panel of Traces

Individual Trace

Deployment Options

Clone this repo to a SageMaker notebook instance
Clone this repo locally and set up AWS CLI credentials to your AWS account

Pre-Requisites

Set up a LangFuse account using the cloudhttps://www.langfuse.com or the self-host option for AWShttps://github.com/aws-samples/deploy-langfuse-on-ecs-with-fargate/tree/main/langfuse-v3
Create an organization in Langfuse
Create a project within your Langfuse organization
Save your Langfuse project keys (Secret Key, Public Key, and Host) to use in config
If you are using the self-hosted option and want to see model costs then you must create a model definition in Langfuse for the LLM used by your agent, instructions can be found herehttps://langfuse.com/docs/model-usage-and-cost#custom-model-definitions

SageMaker Notebook Deployment Steps

Create a SageMaker notebook instance in your AWS account
Open a terminal and navigate to the SageMaker/ folder within the instance

cd SageMaker/

Clone this repository

git clone https://github.com/aws-samples/open-source-bedrock-agent-evaluation.git

Navigate to the repository and install the necessary requirements

cd amazon-bedrock-agent-evaluation-framework/pip3 install -r requirements.txt

Local Deployment Steps

Clone this repository

git clone https://github.com/aws-samples/open-source-bedrock-agent-evaluation.git

Navigate to the repository and install the necessary requirements

cd open-source-bedrock-agent-evaluation/pip3 install -r requirements.txt

Set up AWS CLI to access AWS account resources locallyhttps://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html

Agent Evaluation Options

Bring you own agent to evaluate
Create sample agents from this repository and run evaluations

Option 1: Bring your own agent to evaluate

Bring your existing agent you want to evaluate (Currently RAG and Text2SQL evaluations built-in)
Create a dataset file for evaluations, manually or using the generator (Refer to the data_files/sample_data_file.json for the necessary format)
Copy the template configuration file and fill in the necessary information

cp config_tpl.env.tpl config.env

Run driver.py to execute evaluation job against dataset

python3 driver.py

Check your Langfuse project console to see the evaluation results!

Option 2: Create Sample Agents to run Evaluations

Follow the instructions in theBlog Sample Agents README. This is a guided way to run the evaluation framework on pre-created Bedrock Agents.

Security

SeeCONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

Readme

License

MIT-0 license

Code of conduct

Contributing

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

License

aws-samples/open-source-bedrock-agent-evaluation

Folders and files

Latest commit

History

Repository files navigation

Open Source Bedrock Agent Evaluation

Existing AWS assets

Features

Evaluation Workflow

Evaluation Results in Langfuse

Dashboard

Panel of Traces

Individual Trace

Deployment Options

Pre-Requisites

SageMaker Notebook Deployment Steps

Local Deployment Steps

Agent Evaluation Options

Option 1: Bring your own agent to evaluate

Option 2: Create Sample Agents to run Evaluations

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors5

Uh oh!

Languages

Packages