Updated system prompt of prompt injection guardrail for better performance
Small change tollm_base so all LLM based checks use a shared error reporter and updated other LLM checks to use it
Update eval tool to properly parse multi-turn input data
Updated evals with results of V2

Updated prompt injection check

fae454b

CopilotAI review requested due to automatic review settings

October 28, 2025 18:43

CopilotAI reviewed

Oct 28, 2025

View reviewed changes

Copy link

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

This PR enhances the Prompt Injection Detection guardrail to focus exclusively on analyzing tool calls and tool outputs, while improving the evidence gathering and evaluation framework. The changes refine the security model to only flag content with direct evidence of malicious instructions, rather than inferring injection from behavioral symptoms.

Key changes:

Updated prompt injection detection to skip assistant content messages and only analyze tool calls/outputs
Addedevidence field toPromptInjectionDetectionOutput for capturing specific injection indicators
Enhanced conversation history parsing to gracefully handle non-JSON data
Refactored error handling with sharedcreate_error_result helper function

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/guardrails/checks/text/prompt_injection_detection.py`	Core logic updates: skip assistant messages, add evidence field, enhance system prompt with detailed injection detection criteria
`tests/unit/checks/test_prompt_injection_detection.py`	Comprehensive test coverage for new skip behavior, assistant message handling, and tool output injection scenarios
`src/guardrails/evals/core/async_engine.py`	Enhanced conversation parsing to handle plain strings and non-conversation JSON, support for Jailbreak guardrail
`src/guardrails/evals/core/types.py`	Added`conversation_history` field and getter method to Context class
`src/guardrails/checks/text/llm_base.py`	Extracted`create_error_result` helper function for consistent error handling
`src/guardrails/checks/text/hallucination_detection.py`	Updated to use shared`create_error_result` helper
`tests/integration/test_suite.py`	Commented out multiple test cases, removed config fields
`src/guardrails/evals/.gitignore`	Added`PI_eval/` directory to ignore list

💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.

src/guardrails/evals/core/async_engine.py OutdatedShow resolvedHide resolved

tests/integration/test_suite.pyShow resolvedHide resolved

src/guardrails/checks/text/prompt_injection_detection.py OutdatedShow resolvedHide resolved

src/guardrails/evals/core/async_engine.py OutdatedShow resolvedHide resolved

Formatting changes

98ab91e

steven10a requested a review fromCopilot

October 28, 2025 18:56

CopilotAI reviewed

Oct 28, 2025

View reviewed changes

Copy link

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.

src/guardrails/checks/text/prompt_injection_detection.pyShow resolvedHide resolved

tests/unit/checks/test_prompt_injection_detection.py OutdatedShow resolvedHide resolved

src/guardrails/evals/core/async_engine.py OutdatedShow resolvedHide resolved

Removed legacy code

be2ced6

steven10a requested a review fromCopilot

October 28, 2025 19:14

CopilotAI reviewed

Oct 28, 2025

View reviewed changes

Copy link

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.

src/guardrails/evals/core/async_engine.py

Comment on lines 237 to 244

		"guardrails": [
		{
		"name":guardrail.definition.name,
		"config": (guardrail.config.__dict__ifhasattr(guardrail.config,"__dict__")elseguardrail.config),
		}
		forguardrailinself.guardrails
		ifguardrail.definition.name=="Prompt Injection Detection"
		ifguardrail.definition.nameinconversation_aware_names
		],

Copy link

CopilotAIOct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The configuration creation logic filters guardrails by name match withconversation_aware_names, but this creates a minimal config with only conversation-aware guardrails. Ifself.guardrails doesn't contain a guardrail matching the expected trigger name fromsample.expected_triggers, the minimal_config will have an empty guardrails list, which could cause the evaluation to fail silently or produce incorrect results. The filtering should ensure at least one matching guardrail exists or handle the empty case.

Copilot uses AI. Check for mistakes.

Copy link

CollaboratorAuthor

steven10aOct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is in an if statement that handles that case

src/guardrails/checks/text/prompt_injection_detection.py

		"""
		normalized_messages=normalize_conversation(messages)
		user_texts= [entry["content"]forentryinnormalized_messagesifentry.get("role")=="user"andisinstance(entry.get("content"),str)]
		user_texts= [entry["content"]forentryinmessagesifentry.get("role")=="user"andisinstance(entry.get("content"),str)]

Copy link

CopilotAIOct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This list comprehension will raise aTypeError ifentry[\"content\"] is not a string but is a truthy non-string type (e.g., a list or dict). Theisinstance check happens after the value is already accessed withentry[\"content\"], but the value could be any type. Consider using.get(\"content\") instead of direct access, or handle the case where content might be None before the isinstance check.

Suggested change

	user_texts= [entry["content"]forentryinmessagesifentry.get("role")=="user"andisinstance(entry.get("content"),str)]
	user_texts= [entry.get("content")forentryinmessagesifentry.get("role")=="user"andisinstance(entry.get("content"),str)]

Copilot uses AI. Check for mistakes.

Copy link

CollaboratorAuthor

steven10aOct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We are receiving a normalized message list, so this is not an issue

steven10a requested a review fromgabor-openai

October 28, 2025 19:26

steven10a added2 commits

October 28, 2025 21:47

update results doc

c3b7eef

updating dataset details

f4ae9f6

gabor-openai approved these changes

Oct 29, 2025

View reviewed changes

Copy link

Collaborator

gabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM thank you

gabor-openai merged commitab3f458 intomain

Oct 29, 2025

3 checks passed

gabor-openai deleted the dev/steve/PI_eval branch

October 29, 2025 16:54

Labels

None yet

Movatterモバイル変換

Updated prompt injection check#27

Updated prompt injection check#27

Uh oh!

Conversation

steven10a commentedOct 28, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

CopilotAI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CopilotAI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CopilotAI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

CopilotAIOct 28, 2025

Choose a reason for hiding this comment

Uh oh!

steven10aOct 28, 2025

Choose a reason for hiding this comment

Uh oh!

CopilotAIOct 28, 2025

Choose a reason for hiding this comment

Uh oh!

steven10aOct 28, 2025

Choose a reason for hiding this comment

Uh oh!

gabor-openai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steven10a commentedOct 28, 2025•
edited
Loading