Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
example		example
img		img
oop_evaluate		oop_evaluate
README.md		README.md
main_oop.py		main_oop.py
slurm_oop.sh		slurm_oop.sh

Repository files navigation

Object-Oriented Programming Evaluation Benchmark for LLMs.

OOP Benchmark

OOP is a code generation benchmark toquantify the object-oriented programming ability of language Large Language Models (LLMs), and the details can be seen in our paper "OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models | [HuggingFace Link]".We collect code snippets from theLeetCode,open-source repositories on GitHub,Stack Overflow, andCodewars, and all the test samples have undergone carefully designed post-processing.

We show that 🔎:

⚠️ Despite excelling in functional programming (FP), e.g., HumanEval and MBPP, code-specialized LLMs like WizardCoder lag in our OOP compared to proprietary models like ChatGPT;
🚀 The poor performance of all advanced LLMs on our OOP benchmark highlights a critical need for improvements in this field.

📢 News: [May 15, 2024] OOP has been accepted by ACL 2024 Findings.

Basic Statistics

OOP consists of 431 instances;
OOP contains three difficulty levels: Simple-level OOP, Moderate-level OOP, and Difficult-level OOP.

Performance of widely-used LLMs

Citations

Please cite the paper and star this repo if you use OOP and find it helpful.Feel free to contactwangshuai123@whu.edu.cn or open an issue if you have any questions.

@inproceedings{wang2024oop,title={OOP:Object-OrientedProgrammingEvaluationBenchmarkforLargeLanguageModels},author={Wang,ShuaiandDing,LiangandShen,LiandLuo,YongandDu,BoandTao,Dacheng},booktitle={FindingsoftheAssociationforComputationalLinguistics:ACL2024},year={2024}}

About

The first Object-Oriented Programming (OOP) Evaluaion Benchmark for LLMs

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Object-Oriented Programming Evaluation Benchmark for LLMs.