Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Dev/steven/nsfw docs#30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
gabor-openai merged 4 commits intomainfromdev/steven/nsfw_docs
Oct 29, 2025
Merged

Dev/steven/nsfw docs#30

gabor-openai merged 4 commits intomainfromdev/steven/nsfw_docs
Oct 29, 2025

Conversation

@steven10a
Copy link
Collaborator

Adding nsfw docs and results

CopilotAI review requested due to automatic review settingsOctober 29, 2025 02:02
Copy link

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

This PR enhances the Prompt Injection Detection guardrail with improved analysis capabilities, better test coverage, and broader conversation-aware guardrail support. The changes focus on detecting malicious instructions in tool calls and tool outputs that deviate from user intent.

Key changes:

  • Enhanced prompt injection detection to analyze tool outputs for embedded injection directives (fake conversations, response manipulation)
  • Extended evaluation framework to support multiple conversation-aware guardrails beyond just prompt injection detection
  • Added comprehensive test coverage for various injection attack patterns and edge cases

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
FileDescription
src/guardrails/checks/text/prompt_injection_detection.pyEnhanced detection logic with evidence field, improved prompts for tool output analysis, and updated docstrings to focus on tool calls/outputs
src/guardrails/checks/text/llm_base.pyExtractedcreate_error_result helper function for standardized error handling
src/guardrails/checks/text/hallucination_detection.pyRefactored to use newcreate_error_result helper for consistent error handling
src/guardrails/evals/core/async_engine.pyExtended conversation-aware support to multiple guardrails (Jailbreak, Prompt Injection), improved payload parsing to handle non-JSON strings
src/guardrails/evals/core/types.pyAddedconversation_history field andget_conversation_history method to Context class
tests/unit/checks/test_prompt_injection_detection.pyAdded comprehensive tests for injection patterns, assistant message handling, and edge cases
tests/unit/evals/test_async_engine.pyUpdated test to reflect new behavior of wrapping non-JSON strings as user messages
tests/integration/test_suite.pyRemoved redundant config fields from pipeline configuration
tests/unit/test_resources_responses.pyAdded blank line for formatting
src/guardrails/evals/.gitignoreAddedPI_eval/ directory to gitignore
mkdocs.ymlReorganized checks documentation alphabetically
docs/ref/checks/nsfw.mdUpdated benchmark results with new model performance metrics

💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.

Comment on lines +233 to 235
# Create a minimal guardrails config for conversation-aware checks
minimal_config= {
"version":1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The config dictionary is missing thestage_name key that was previously present. While this may be intentional cleanup, the code should ensure the minimal config structure is valid and matches whatGuardrailsAsyncOpenAI expects. Consider adding a comment explaining the minimal required structure.

Suggested change
# Create a minimal guardrails config for conversation-aware checks
minimal_config= {
"version":1,
# Create a minimal guardrails config for conversation-aware checks.
# The minimal required structure for GuardrailsAsyncOpenAI includes:
# - "version": config version
# - "stage_name": name of the stage (e.g., "output")
# - "output": { "guardrails": [ ... ] }
minimal_config= {
"version":1,
"stage_name":"output",

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@gabor-openaigabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM TY

@gabor-openaigabor-openai merged commit12c4add intomainOct 29, 2025
9 checks passed
@gabor-openaigabor-openai deleted the dev/steven/nsfw_docs branchOctober 29, 2025 16:57
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

Copilot code reviewCopilotCopilot left review comments

@gabor-openaigabor-openaigabor-openai approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@steven10a@gabor-openai

[8]ページ先頭

©2009-2025 Movatter.jp