Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitd0b8457

Browse files
authored
Merge pull request#58 from codefuse-ai/v0.4.dev
V0.4.dev
2 parentscf38e74 +a2a0bff commitd0b8457

31 files changed

+3689
-1059
lines changed

‎README.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,19 @@
4646

4747

4848
##News
49+
🔥🔥🔥[2024/05/20] We released**MFTCoder v0.4.2**, mainly for MFTCoder-accelerate. It supports**QLoRA + DeepSpeed Zero3** and**QLoRA + FSDP** as options allowing you training very large models. It now supports new models like Qwen2, Qwen2-MoE, Starcoder2, Gemma, etc.
4950

50-
🔥🔥🔥[2024/01/30] The model[CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) fine-tuned with MFTCoder ranks first in HuggingFace[Big Code Models LeaderBoard](https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard)
51+
🔥🔥🔥[2024/05/20] Our paper[MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning](https://arxiv.org/abs/2311.02303) has been accepted by KDD2024.
5152

52-
🔥🔥🔥[2024/01/17] We released MFTCoder v0.3.0, mainly for MFTCoder-accelerate. It now supports new models like Mixtral(MoE), DeepSeek-coder, chatglm3. It supports FSDP as an option. It also supports Self-paced Loss as a solution for convergence balance in Multitask Fine-tuning.
53+
🔥🔥🔥[2024/05/20][CodeFuse-StarCoder2-15B](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B) has been released, achieving a pass@1 (greedy decoding) score of 73.2% on HumanEval.
5354

54-
🔥🔥🔥[2024/01/17][CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)has been released, achieving a pass@1 (greedy decoding) score of 78.7% on HumanEval. It lists as top-1 LLM on Bigcode Leardboard interms of win-rate, the official result is going to be published later.
55+
🔥🔥[2024/01/30] The model[CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)fine-tuned with MFTCoder ranks first inHuggingFace[Big Code Models LeaderBoard](https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard)
5556

56-
🔥🔥🔥[2024/01/17][CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8X7B) has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval.
57+
🔥🔥[2024/01/17] We released MFTCoder v0.3.0, mainly for MFTCoder-accelerate. It now supports new models like Mixtral(MoE), DeepSeek-coder, chatglm3. It supports FSDP as an option. It also supports Self-paced Loss as a solution for convergence balance in Multitask Fine-tuning.
58+
59+
🔥🔥[2024/01/17][CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B) has been released, achieving a pass@1 (greedy decoding) score of 78.7% on HumanEval. It lists as top-1 LLM on Bigcode Leardboard in terms of win-rate, the official result is going to be published later.
60+
61+
🔥🔥[2024/01/17][CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8X7B) has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval.
5762

5863
🔥🔥[2023/11/07][MFTCoder Paper](https://arxiv.org/abs/2311.02303) has been released on Arxiv, which discloses technique details of multi-task-fine-tuning.
5964

@@ -73,6 +78,7 @@
7378
|**CodeFuse-DeepSeek-33B**|**78.7%**| 2024/01|
7479
|**CodeFuse-CodeLlama-34B**|**74.4%**| 2023/09|
7580
|**CodeFuse-CodeLlama-34B-4bits**|**73.8%**| 2023/09|
81+
|**CodeFuse-StarCoder2-15B**|**73.2%**| 2023/05|
7682
| WizardCoder-Python-34B-V1.0| 73.2%| 2023/08|
7783
| GPT-4(zero-shot)| 67.0%| 2023/03|
7884
| PanGu-Coder2 15B| 61.6%| 2023/08|
@@ -88,7 +94,7 @@
8894

8995

9096
##Articles
91-
[MFT Arxiv paper](https://arxiv.org/abs/2311.02303)
97+
[MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning (KDD2024)](https://arxiv.org/abs/2311.02303)
9298

9399
##Introduction
94100

@@ -125,13 +131,13 @@ The main components of this project include:
125131

126132

127133
##Requirements
128-
To begin, ensure that you have successfully installed CUDA (version >= 11.4, preferably11.7) along with the necessary drivers. Additionally, make sure you have installed torch (version2.0.1).
134+
To begin, ensure that you have successfully installed CUDA (version >= 11.4, preferably12.1) along with the necessary drivers. Additionally, make sure you have installed torch (version>= 2.1.0).
129135

130136
Next, we have provided an init_env.sh script to simplify the installation of required packages. Execute the following command to run the script:
131137
```bash
132138
sh init_env.sh
133139
```
134-
We highly recommend training with flash attention(version >= 2.1.0, preferably 2.3.6), please refer to the following link for installation instructions:https://github.com/Dao-AILab/flash-attention
140+
We highly recommend training with flash attention(version >= 2.3.0), please refer to the following link for installation instructions:https://github.com/Dao-AILab/flash-attention
135141

136142

137143
##Training
@@ -152,16 +158,16 @@ If you want to explore some new framework like atorch, you could check:
152158
We are excited to release the following two CodeLLMs trained by MFTCoder, now available on both HuggingFace and ModelScope:
153159

154160

155-
| Model| HuggingFace Links| ModelScope Links| Base Model| Num of examples trained| Batch Size| Seq Length|
156-
|--------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------|------|------------|------------|
157-
| 🔥 CodeFuse-DeepSeek-33B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B)| DeepSeek-coder-33B|60万| 80| 4096|
158-
| 🔥 CodeFuse-Mixtral-8x7B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B)| Mixtral-8x7B|60万| 80| 4096|
159-
| 🔥 CodeFuse-CodeLlama-34B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B)| CodeLlama-34b-Python|60万| 80| 4096|
160-
| 🔥 CodeFuse-CodeLlama-34B-4bits|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)| CodeLlama-34b-Python||| 4096|
161-
| 🔥 CodeFuse-StarCoder-15B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B)| StarCoder-15B|60万| 80| 4096|
162-
| 🔥 CodeFuse-QWen-14B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B)| Qwen-14b|110万| 256| 4096|
163-
| 🔥 CodeFuse-CodeGeex2-6B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B)| CodeGeex2-6B|110万| 256| 4096|
164-
161+
| Model| HuggingFace Links| ModelScope Links| Base Model| Num of examples trained| Batch Size| Seq Length|
162+
|----------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------|-------------------------|------------|------------|
163+
| 🔥 CodeFuse-DeepSeek-33B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B)| DeepSeek-coder-33B|600K| 80| 4096|
164+
| 🔥 CodeFuse-Mixtral-8x7B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B)| Mixtral-8x7B|600K| 80| 4096|
165+
| 🔥 CodeFuse-CodeLlama-34B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B)| CodeLlama-34b-Python|600K| 80| 4096|
166+
| 🔥 CodeFuse-CodeLlama-34B-4bits|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)| CodeLlama-34b-Python||| 4096|
167+
| 🔥 CodeFuse-StarCoder-15B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B)| StarCoder-15B|600K| 80| 4096|
168+
| 🔥 CodeFuse-QWen-14B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B)| Qwen-14b|1.1 Million| 256| 4096|
169+
| 🔥 CodeFuse-CodeGeex2-6B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B)| CodeGeex2-6B|1.1 Million| 256| 4096|
170+
| 🔥 CodeFuse-StarCoder2-15B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder2-15B)| Starcoder2-15B| 700K| 128| 4096|
165171

166172
##Datasets
167173
We are also pleased to release two code-related instruction datasets, meticulously selected from a range of datasets to facilitate multitask training. Moving forward, we are committed to releasing additional instruction datasets covering various code-related tasks.

‎README_cn.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,17 @@
4545

4646

4747
##新闻
48-
🔥🔥🔥[2024/01/17]**MFTCoder-v0.3.0**发布。新增对Mixtral(MoE), DeepSeek等模型的支持;新增支持FSDP(Fully Sharded Data Parallel);新增Self-paced Loss, 支持多任务收敛均衡。 感兴趣详见微信公众号CodeFuse的文章[MFTCoder 重磅升级v0.3.0发布](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
48+
🔥🔥🔥[2024/05/20]**MFTCoder-v0.4.2**发布。新增支持**QLoRA+ DeepSpeed Zero3**,**QLoRA + FSDP**训练模式,可以更好的支持微调更大的模型,比如Qwen1.5-70B等。新增对Qwen2, Qwen2-MoE, Starcoder2, Gemma等模型的支持。
4949

50-
🔥🔥🔥[2024/01/17] 开源了[CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)模型,在HumanEval pass@1(greedy decoding)上可以达到78.7%。该模型在Big Code榜单的结果近期发布,请关注公众号获取最新信息。
50+
🔥🔥🔥[2024/05/20] 我们的论文[MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning](https://arxiv.org/abs/2311.02303) 已被 KDD 2024 接收.
5151

52-
🔥🔥🔥[2024/01/17] 开源了[CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)模型,在HumanEval pass@1(greedy decoding)上可以达到56.1%。感兴趣详见微信公众号CodeFuse的文章[MFTCoder提升Mixtral-8x7B混合专家模型的代码能力实践](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
52+
🔥🔥🔥 开源了[CodeFuse-StarCoder2-15B](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B)模型,在HumanEval上可以达到73.2%,多代码语言能力均衡.
53+
54+
🔥🔥[2024/01/17]**MFTCoder-v0.3.0**发布。新增对Mixtral(MoE), DeepSeek等模型的支持;新增支持FSDP(Fully Sharded Data Parallel);新增Self-paced Loss, 支持多任务收敛均衡。 感兴趣详见微信公众号CodeFuse的文章[MFTCoder 重磅升级v0.3.0发布](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
55+
56+
🔥🔥[2024/01/17] 开源了[CodeFuse-DeepSeek-33B](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)模型,在HumanEval pass@1(greedy decoding)上可以达到78.7%。该模型在Big Code榜单的结果近期发布,请关注公众号获取最新信息。
57+
58+
🔥🔥[2024/01/17] 开源了[CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)模型,在HumanEval pass@1(greedy decoding)上可以达到56.1%。感兴趣详见微信公众号CodeFuse的文章[MFTCoder提升Mixtral-8x7B混合专家模型的代码能力实践](https://mp.weixin.qq.com/s/xI3f0iUKq9TIIKZ_kMtcQg)
5359

5460
🔥🔥[2023/11/07][MFTCoder论文](https://arxiv.org/abs/2311.02303)在Arxiv公布,介绍了多任务微调的技术细节。
5561

@@ -69,6 +75,7 @@
6975
|**CodeFuse-DeepSeek-33B**|**78.7%**| 2024/01|
7076
|**CodeFuse-CodeLlama-34B**|**74.4%**| 2023/09|
7177
|**CodeFuse-CodeLlama-34B-4bits**|**73.8%**| 2023/09|
78+
|**CodeFuse-StarCoder2-15B**|**73.2%**| 2023/05|
7279
| WizardCoder-Python-34B-V1.0| 73.2%| 2023/08|
7380
| GPT-4(zero-shot)| 67.0%| 2023/03|
7481
| PanGu-Coder2 15B| 61.6%| 2023/08|
@@ -118,12 +125,12 @@
118125

119126

120127
##环境
121-
首先, 你需要将CUDA(>=11.4,推荐11.7)及其相关驱动安装成功,并确保其工作正常, 并且安装基本的torch(>=2.0.0)
128+
首先, 你需要将CUDA(>=11.4,推荐12.1)及其相关驱动安装成功,并确保其工作正常, 并且安装基本的torch(>=2.1.0)
122129
在requirements.txt下固定了几个主要的python包的版本,执行如下脚本即可:
123130
```bash
124131
sh init_env.sh
125132
```
126-
我们强烈建议您安装flash attention(>=2.1.0, 推荐2.3.6), 安装请参考https://github.com/Dao-AILab/flash-attention
133+
我们强烈建议您安装flash attention(>=2.3.0), 安装请参考https://github.com/Dao-AILab/flash-attention
127134

128135
##训练
129136
如果你熟悉大模型训练的各种主流开源资源,例如```transformers```,```DeepSpeed```,```FSDP```等, 为了用开源项目快速上手高性能微调,我们建议您尝试:
@@ -145,11 +152,11 @@ sh init_env.sh
145152
| 🔥🔥🔥 CodeFuse-DeepSeek-33B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B)| DeepSeek-coder-33B| 60万| 80| 4096|
146153
| 🔥🔥🔥 CodeFuse-Mixtral-8x7B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B)| Mixtral-8x7B| 60万| 80| 4096|
147154
| 🔥🔥🔥 CodeFuse-CodeLlama-34B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B)| CodeLlama-34b-Python| 60万| 80| 4096|
148-
| 🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)| CodeLlama-34b-Python||| 4096|
155+
| 🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits)| CodeLlama-34b-Python||| 4096|
149156
| 🔥🔥🔥 CodeFuse-StarCoder-15B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B)| StarCoder-15B| 60万| 80| 4096|
150-
| 🔥🔥🔥 CodeFuse-QWen-14B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B)| Qwen-14b| 110万| 256| 4096|
151-
| 🔥🔥🔥 CodeFuse-CodeGeex2-6B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B)| CodeGeex2-6B| 110万| 256| 4096|
152-
157+
| 🔥🔥🔥 CodeFuse-QWen-14B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B)| Qwen-14b| 110万| 256| 4096|
158+
| 🔥🔥🔥 CodeFuse-CodeGeex2-6B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B)| CodeGeex2-6B| 110万| 256| 4096|
159+
| 🔥🔥🔥 CodeFuse-StarCoder2-15B|[h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder2-15B)|[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder2-15B)| Starcoder2-15B| 70万| 128| 4096|
153160

154161

155162

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp