Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs

NotificationsYou must be signed in to change notification settings

alphadl/OOP-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OOP Benchmark

OOP is a code generation benchmark toquantify the object-oriented programming ability of language Large Language Models (LLMs), and the details can be seen in our paper "OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models | [HuggingFace Link]".We collect code snippets from theLeetCode,open-source repositories on GitHub,Stack Overflow, andCodewars, and all the test samples have undergone carefully designed post-processing.

We show that 🔎:

  • ⚠️ Despite excelling in functional programming (FP), e.g., HumanEval and MBPP, code-specialized LLMs like WizardCoder lag in our OOP compared to proprietary models like ChatGPT;
  • 🚀 The poor performance of all advanced LLMs on our OOP benchmark highlights a critical need for improvements in this field.

📢 News: [May 15, 2024] OOP has been accepted by ACL 2024 Findings.

Basic Statistics

  • OOP consists of 431 instances;
  • OOP contains three difficulty levels: Simple-level OOP, Moderate-level OOP, and Difficult-level OOP.

Performance of widely-used LLMs

image

Citations

Please cite the paper and star this repo if you use OOP and find it helpful.Feel free to contactwangshuai123@whu.edu.cn or open an issue if you have any questions.

@inproceedings{wang2024oop,title={OOP:Object-OrientedProgrammingEvaluationBenchmarkforLargeLanguageModels},author={Wang,ShuaiandDing,LiangandShen,LiandLuo,YongandDu,BoandTao,Dacheng},booktitle={FindingsoftheAssociationforComputationalLinguistics:ACL2024},year={2024}}

About

The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp