aixcoder-plugin/aiXcoder-FIM-EvaluationPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star1

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
codebleu		codebleu
datasets		datasets
README.md		README.md
README_CN.md		README_CN.md
requirements.txt		requirements.txt
run_evaluate.py		run_evaluate.py
run_inference.py		run_inference.py
utils.py		utils.py

Repository files navigation

aiXcoder FIM Code Generation LLM Evaluation

This is a tool for evaluating FIM code generation for generating and evaluating tasks on datasets in four languages: java/python/cpp/jsvascript.

Introduction to datasets and evaluation metrics

For Java, Python, CPP, and JSVASCRIPT languages, to provide the upper and lower information of the code block, you need to predict the middle filling.Evaluation metrics include:Exact Match、BLEU-4、CODE-BLEU、Length(Pred/Ref)

Exact Match
BLEU-4
CODE-BLEU
Length(Pred/Ref)

Environment Requirements

To run the model inference and evaluatiaon code, you'll need the following environment setup:

Python 3.8 or higher
PyTorch 2.1.0 or higher
sentencepiece 0.2.0 or higher
transformers 4.34.1 or higher (if run inference by transformers library)

Please ensure all dependencies are installed using the following command:

conda create -n aixcoder-evaluation python=3.11conda activate aixcoder-evaluationpip install -r requirements.txt

requirements.txt listed all necessary libraries and their versions.

To achieve faster inference speeds, especially for large models, we recommend installingflash attention.Flash attention is an optimized attention mechanism that significantly reduces computation time for transformer-based models without sacrificing accuracy.

Before proceeding, ensure your environment meets the CUDA requirements asflash attention leverages GPU acceleration. Follow these steps to installflash attention:

git clone git@github.com:Dao-AILab/flash-attention.gitcd flash-attentionMAX_JOBS=8 python setup.py install

Usage

Datasets preparation

cd datasetstar zxvf*.tar.gz

Generation

Here's an example of a generate task.python run_inference.py --model aiXcoder/aixcoder-7b-base --language java

--modelmodel name on huggingface,

Currently, FIM generation can be performed for four models on the huggingface

- deepseek-ai/deepseek-coder-6.7b-base- aiXcoder/aixcoder-7b-base- codellama/CodeLlama-7b-hf- bigcode/starcoder2-7bYou can also set the model weight file that has been downloaded locally

--languageDataset language
- Support Python Java Cplus JavaScript four languages, you can set a language separately, you can also set multiple languages at the same time, and multiple languages are separated by spaces
--output_dirThe output path of the generated result is saved in the output_dir folder in the current directory by default
--deviceSet the cuda used, default cuda
--torch_dtypeSet the precision, default bf16, can be set to:"fp32", "fp16", "bf16"
--attn_implementationThe setting uses FlashAttention, default True, if you don't support FlashAttention, set this to False
--gen_lenSet max generate length, default 512
--max_lenSetmax_new_tokens, default 16384

Evaluation

Here's an example of a evaluate task.python run_evaluate.py

--languageThe language to be evaluated
- Support Python Java Cplus JavaScript four languages, you can set a language separately, you can also set multiple languages at the same time, and multiple languages are separated by spaces
--result_pathBy default, the output path of the evaluation results is stored in the output_dir folder in the current directoryTwo files are generated with the suffix _scored.jsonl and _statistics.txtThe results of each assessment for each Task Type and the average of the total results are recorded in the _statistics.txt

About

No description, website, or topics provided.

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

aiXcoder FIM Code Generation LLM Evaluation

Introduction to datasets and evaluation metrics

Environment Requirements

Usage

Datasets preparation

Generation

Currently, FIM generation can be performed for four models on the huggingface

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

aixcoder-plugin/aiXcoder-FIM-Evaluation

Folders and files

Latest commit

History

Repository files navigation

aiXcoder FIM Code Generation LLM Evaluation

Introduction to datasets and evaluation metrics

Environment Requirements

Usage

Datasets preparation

Generation

Currently, FIM generation can be performed for four models on the huggingface

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages