Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
#

llm-safety

Here are 12 public repositories matching this topic...

Attack to induce LLMs within hallucinations

  • UpdatedMay 17, 2024
  • Python

Papers about red teaming LLMs and Multimodal models.

  • UpdatedNov 22, 2024

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

  • UpdatedSep 20, 2024
  • Python

Restore safety in fine-tuned language models through task arithmetic

  • UpdatedMar 28, 2024
  • Python

NeurIPS'24 - LLM Safety Landscape

  • UpdatedFeb 25, 2025
  • Python

Code and dataset for the paper: "Can Editing LLMs Inject Harm?"

  • UpdatedNov 9, 2024
  • Python

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

  • UpdatedFeb 26, 2025
  • Python

Some Thoughts on AI Alignment: Using AI to Control AI

  • UpdatedFeb 25, 2025

Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.

  • UpdatedApr 15, 2024

safety benchmarking

  • UpdatedMar 28, 2025
  • Python

Improve this page

Add a description, image, and links to thellm-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-safety topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp