Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation,member institutions, and all contributors.Donate
arxiv logo>cs> arXiv:2404.12241
arXiv logo
Cornell University Logo

Computer Science > Computation and Language

arXiv:2404.12241 (cs)
[Submitted on 18 Apr 2024 (v1), last revised 13 May 2024 (this version, v2)]

Title:Introducing v0.5 of the AI Safety Benchmark from MLCommons

Authors:Bertie Vidgen,Adarsh Agrawal,Ahmed M. Ahmed,Victor Akinwande,Namir Al-Nuaimi,Najla Alfaraj,Elie Alhajjar,Lora Aroyo,Trupti Bavalatti,Max Bartolo,Borhane Blili-Hamelin,Kurt Bollacker,Rishi Bomassani,Marisa Ferrara Boston,Siméon Campos,Kal Chakra,Canyu Chen,Cody Coleman,Zacharie Delpierre Coudert,Leon Derczynski,Debojyoti Dutta,Ian Eisenberg,James Ezick,Heather Frase,Brian Fuller,Ram Gandikota,Agasthya Gangavarapu,Ananya Gangavarapu,James Gealy,Rajat Ghosh,James Goel,Usman Gohar,Sujata Goswami,Scott A. Hale,Wiebke Hutiri,Joseph Marvin Imperial,Surgan Jandial,Nick Judd,Felix Juefei-Xu,Foutse Khomh,Bhavya Kailkhura,Hannah Rose Kirk,Kevin Klyman,Chris Knotz,Michael Kuchnik,Shachi H. Kumar,Srijan Kumar,Chris Lengerich,Bo Li,Zeyi Liao,Eileen Peters Long,Victor Lu,Sarah Luger,Yifan Mai,Priyanka Mary Mammen,Kelvin Manyeki,Sean McGregor,Virendra Mehta,Shafee Mohammed,Emanuel Moss,Lama Nachman,Dinesh Jinenhally Naganna,Amin Nikanjam,Besmira Nushi,Luis Oala,Iftach Orr,Alicia Parrish,Cigdem Patlak,William Pietri,Forough Poursabzi-Sangdeh,Eleonora Presani,Fabrizio Puletti,Paul Röttger,Saurav Sahay,Tim Santos,Nino Scherrer,Alice Schoenauer Sebag,Patrick Schramowski,Abolfazl Shahbazi,Vin Sharma,Xudong Shen,Vamsi Sistla,Leonard Tang,Davide Testuggine,Vithursan Thangarasa,Elizabeth Anne Watkins,Rebecca Weiss,Chris Welty,Tyler Wilbers,Adina Williams,Carole-Jean Wu,Poonam Yadav,Xianjun Yang,Yi Zeng,Wenhui Zhang,Fedor Zhdanov,Jiacheng Zhu,Percy Liang,Peter Mattson,Joaquin Vanschoren
View PDFHTML (experimental)
Abstract:This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:arXiv:2404.12241 [cs.CL]
 (orarXiv:2404.12241v2 [cs.CL] for this version)
 https://doi.org/10.48550/arXiv.2404.12241
arXiv-issued DOI via DataCite

Submission history

From: Bertie Vidgen Dr [view email]
[v1] Thu, 18 Apr 2024 15:01:00 UTC (311 KB)
[v2] Mon, 13 May 2024 20:46:10 UTC (311 KB)
Full-text links:

Access Paper:

Current browse context:
cs.CL
Change to browse by:
export BibTeX citation

Bookmark

BibSonomy logoReddit logo

Bibliographic and Citation Tools

Bibliographic Explorer(What is the Explorer?)
Connected Papers(What is Connected Papers?)
scite Smart Citations(What are Smart Citations?)

Code, Data and Media Associated with this Article

CatalyzeX Code Finder for Papers(What is CatalyzeX?)
Hugging Face(What is Huggingface?)
Papers with Code(What is Papers with Code?)

Demos

Hugging Face Spaces(What is Spaces?)

Recommenders and Search Tools

Influence Flower(What are Influence Flowers?)
CORE Recommender(What is CORE?)

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community?Learn more about arXivLabs.

Which authors of this paper are endorsers? |Disable MathJax (What is MathJax?)

[8]ページ先頭

©2009-2025 Movatter.jp