llm-safety
Here are 12 public repositories matching this topic...
Sort:Most stars
Attack to induce LLMs within hallucinations
- Updated
May 17, 2024 - Python
Papers about red teaming LLMs and Multimodal models.
- Updated
Nov 22, 2024
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
- Updated
Sep 20, 2024 - Python
Restore safety in fine-tuned language models through task arithmetic
- Updated
Mar 28, 2024 - Python
Repository accompanying the paperhttps://arxiv.org/abs/2407.14937
- Updated
Feb 23, 2025
NeurIPS'24 - LLM Safety Landscape
- Updated
Feb 25, 2025 - Python
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
- Updated
Nov 9, 2024 - Python
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
- Updated
Feb 26, 2025 - Python
Some Thoughts on AI Alignment: Using AI to Control AI
- Updated
Feb 25, 2025
Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.
- Updated
Apr 15, 2024
A prettified page for MIT's AI Risk Database
- Updated
Aug 24, 2024 - HTML
Improve this page
Add a description, image, and links to thellm-safety topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellm-safety topic, visit your repo's landing page and select "manage topics."