Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

License

NotificationsYou must be signed in to change notification settings

aws-samples/open-source-bedrock-agent-evaluation

Open Source Bedrock Agent Evaluation is an evalauation framework for Amazon Bedrock agent tool-use and chain-of-thought reasoning with observability dashboards in LangFuse.

Existing AWS assets

https://github.com/awslabs/agent-evaluation implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.

Our repository provides the following additional features:

Features

  • Test your own Bedrock Agent with custom questions
  • Provides the option for LLM-as-a-judge without ground truth reference
  • Includes both Agent Goal metrics for chain of thought , and Task specific metrics with RAG, Text2SQL and custom tools
  • Observability with integration with Langfuse that includes latency and cost information
  • Dashboard comparison for comparison of agents with multiple Bedrock LLMs

Evaluation Workflow

Evaluation Workflow

Evaluation Results in Langfuse

Dashboard

Example Dashboard

Panel of Traces

Example Traces

Individual Trace

Example Trace

Deployment Options

  1. Clone this repo to a SageMaker notebook instance
  2. Clone this repo locally and set up AWS CLI credentials to your AWS account

Pre-Requisites

  1. Set up a LangFuse account using the cloudhttps://www.langfuse.com or the self-host option for AWShttps://github.com/aws-samples/deploy-langfuse-on-ecs-with-fargate/tree/main/langfuse-v3

  2. Create an organization in Langfuse

  3. Create a project within your Langfuse organization

  4. Save your Langfuse project keys (Secret Key, Public Key, and Host) to use in config

  5. If you are using the self-hosted option and want to see model costs then you must create a model definition in Langfuse for the LLM used by your agent, instructions can be found herehttps://langfuse.com/docs/model-usage-and-cost#custom-model-definitions

SageMaker Notebook Deployment Steps

  1. Create a SageMaker notebook instance in your AWS account

  2. Open a terminal and navigate to the SageMaker/ folder within the instance

cd SageMaker/
  1. Clone this repository
git clone https://github.com/aws-samples/open-source-bedrock-agent-evaluation.git
  1. Navigate to the repository and install the necessary requirements
cd amazon-bedrock-agent-evaluation-framework/pip3 install -r requirements.txt

Local Deployment Steps

  1. Clone this repository
git clone https://github.com/aws-samples/open-source-bedrock-agent-evaluation.git
  1. Navigate to the repository and install the necessary requirements
cd open-source-bedrock-agent-evaluation/pip3 install -r requirements.txt
  1. Set up AWS CLI to access AWS account resources locallyhttps://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html

Agent Evaluation Options

  1. Bring you own agent to evaluate
  2. Create sample agents from this repository and run evaluations

Option 1: Bring your own agent to evaluate

  1. Bring your existing agent you want to evaluate (Currently RAG and Text2SQL evaluations built-in)

  2. Create a dataset file for evaluations, manually or using the generator (Refer to the data_files/sample_data_file.json for the necessary format)

  3. Copy the template configuration file and fill in the necessary information

cp config_tpl.env.tpl config.env
  1. Run driver.py to execute evaluation job against dataset
python3 driver.py
  1. Check your Langfuse project console to see the evaluation results!

Option 2: Create Sample Agents to run Evaluations

Follow the instructions in theBlog Sample Agents README. This is a guided way to run the evaluation framework on pre-created Bedrock Agents.

Security

SeeCONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors5


[8]ページ先頭

©2009-2025 Movatter.jp