Guardrails

Guardrails runin parallel to your agents, enabling you to do checks and validations of user input. For example, imagine you have an agent that uses a very smart (and hence slow/expensive) model to help with customer requests. You wouldn't want malicious users to ask the model to help them with their math homework. So, you can run a guardrail with a fast/cheap model. If the guardrail detects malicious usage, it can immediately raise an error, which stops the expensive model from running and saves you time/money.

There are two kinds of guardrails:

Input guardrails run on the initial user input
Output guardrails run on the final agent output

Input guardrails

Input guardrails run in 3 steps:

First, the guardrail receives the same input passed to the agent.

Next, the guardrail function runs to produce a GuardrailFunctionOutput, which is then wrapped in anInputGuardrailResult
Finally, we check if.tripwire_triggered is true. If true, anInputGuardrailTripwireTriggered exception is raised, so you can appropriately respond to the user or handle the exception.

Note

Input guardrails are intended to run on user input, so an agent's guardrails only run if the agent is thefirst agent. You might wonder, why is theguardrails property on the agent instead of passed toRunner.run? It's because guardrails tend to be related to the actual Agent - you'd run different guardrails for different agents, so colocating the code is useful for readability.

Output guardrails

Output guardrails run in 3 steps:

First, the guardrail receives the output produced by the agent.
Next, the guardrail function runs to produce aGuardrailFunctionOutput, which is then wrapped in anOutputGuardrailResult
Finally, we check if.tripwire_triggered is true. If true, anOutputGuardrailTripwireTriggered exception is raised, so you can appropriately respond to the user or handle the exception.

Note

Output guardrails are intended to run on the final agent output, so an agent's guardrails only run if the agent is thelast agent. Similar to the input guardrails, we do this because guardrails tend to be related to the actual Agent - you'd run different guardrails for different agents, so colocating the code is useful for readability.

Tripwires

If the input or output fails the guardrail, the Guardrail can signal this with a tripwire. As soon as we see a guardrail that has triggered the tripwires, we immediately raise a{Input,Output}GuardrailTripwireTriggered exception and halt the Agent execution.

Implementing a guardrail

You need to provide a function that receives input, and returns aGuardrailFunctionOutput. In this example, we'll do this by running an Agent under the hood.

frompydanticimportBaseModelfromagentsimport(Agent,GuardrailFunctionOutput,InputGuardrailTripwireTriggered,RunContextWrapper,Runner,TResponseInputItem,input_guardrail,)classMathHomeworkOutput(BaseModel):is_math_homework:boolreasoning:strguardrail_agent=Agent(# (1)!name="Guardrail check",instructions="Check if the user is asking you to do their math homework.",output_type=MathHomeworkOutput,)@input_guardrailasyncdefmath_guardrail(# (2)!ctx:RunContextWrapper[None],agent:Agent,input:str|list[TResponseInputItem])->GuardrailFunctionOutput:result=awaitRunner.run(guardrail_agent,input,context=ctx.context)returnGuardrailFunctionOutput(output_info=result.final_output,# (3)!tripwire_triggered=result.final_output.is_math_homework,)agent=Agent(# (4)!name="Customer support agent",instructions="You are a customer support agent. You help customers with their questions.",input_guardrails=[math_guardrail],)asyncdefmain():# This should trip the guardrailtry:awaitRunner.run(agent,"Hello, can you help me solve for x: 2x + 3 = 11?")print("Guardrail didn't trip - this is unexpected")exceptInputGuardrailTripwireTriggered:print("Math homework guardrail tripped")

We'll use this agent in our guardrail function.
This is the guardrail function that receives the agent's input/context, and returns the result.
We can include extra information in the guardrail result.
This is the actual agent that defines the workflow.

Output guardrails are similar.

frompydanticimportBaseModelfromagentsimport(Agent,GuardrailFunctionOutput,OutputGuardrailTripwireTriggered,RunContextWrapper,Runner,output_guardrail,)classMessageOutput(BaseModel):# (1)!response:strclassMathOutput(BaseModel):# (2)!reasoning:stris_math:boolguardrail_agent=Agent(name="Guardrail check",instructions="Check if the output includes any math.",output_type=MathOutput,)@output_guardrailasyncdefmath_guardrail(# (3)!ctx:RunContextWrapper,agent:Agent,output:MessageOutput)->GuardrailFunctionOutput:result=awaitRunner.run(guardrail_agent,output.response,context=ctx.context)returnGuardrailFunctionOutput(output_info=result.final_output,tripwire_triggered=result.final_output.is_math,)agent=Agent(# (4)!name="Customer support agent",instructions="You are a customer support agent. You help customers with their questions.",output_guardrails=[math_guardrail],output_type=MessageOutput,)asyncdefmain():# This should trip the guardrailtry:awaitRunner.run(agent,"Hello, can you help me solve for x: 2x + 3 = 11?")print("Guardrail didn't trip - this is unexpected")exceptOutputGuardrailTripwireTriggered:print("Math output guardrail tripped")

This is the actual agent's output type.
This is the guardrail's output type.
This is the guardrail function that receives the agent's output, and returns the result.
This is the actual agent that defines the workflow.

Movatterモバイル変換

Guardrails

Input guardrails

Output guardrails

Tripwires

Implementing a guardrail