You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -46,6 +46,10 @@
46
46
47
47
48
48
##News
49
+
🔥🔥🔥[2024/10/31] We released**MFTCoder v0.5** mainly for MFTCoder-accelerate, which is now supporting preference alignment methods like**DPO/RPO/ORPO** in the new**xxpo** module, adding full-parameter continue-training in the additional**mpt** module along with its**offline_tokenization** module, updating selfpaced method to new convergence balance(CoBa) method for MFT in the original**pefts** module.
50
+
51
+
🔥🔥🔥[2024/10/31] Our paper[CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) has been accepted by EMNLP-2024, which achieves balanced convergence across various tasks.
52
+
49
53
🔥🔥🔥[2024/05/20] We released**MFTCoder v0.4**, mainly for MFTCoder-accelerate. It supports**QLoRA + DeepSpeed Zero3** and**QLoRA + FSDP** as options allowing you training very large models. It now supports new models like Qwen2, Qwen2-MoE, Starcoder2, Gemma, etc.
50
54
51
55
🔥🔥🔥[2024/05/20] Our paper[MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning](https://arxiv.org/abs/2311.02303) has been accepted by KDD2024.
🔥🔥🔥[2024/10/31] 我们的论文[CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) 已被 EMNLP 2024 接收,可以实现多任务收敛均衡。
Copy file name to clipboardExpand all lines: mftcoder_accelerate/README.md
+92-10Lines changed: 92 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -7,22 +7,28 @@
7
7
[[中文]](README_cn.md)[**English**]
8
8
9
9
##1. Updates
10
+
🔥 MFTCoder-accelerate now supports DPO/ORPO training through xxpo module.
11
+
12
+
🔥 MFTCoder-accelerate now supports continue training through mpt module along with offline_tokenization module.
13
+
14
+
🔥 MFTCoder-accelerate supports MFT with latest implementation of CoBa Loss (selfpaced Loss) for better Convergence Balance.
15
+
10
16
🔥 MFTCoder-accelerate now support these modes: QLoRA/LoRA + DeepSpeed ZeRO2, QLoRA + DeepSpeed ZeRO3, Full-parameter + DeepSpeed ZeRO3, QLoRA + FSDP, Full-parameter + FSDP.
11
17
12
-
🔥 MFTCoder-accelerate supports QLoRA + DeepSpeed ZeRO3 and QLoRA + FSDP, which both work for larger models;
18
+
🔥 MFTCoder-accelerate supports QLoRA + DeepSpeed ZeRO3 and QLoRA + FSDP, which both work for larger models.
13
19
14
-
🔥 MFTCoder-accelerate supports MFT/SFT on more new mainstream open-source base models: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3;
20
+
🔥 MFTCoder-accelerate supports MFT/SFT on more new mainstream open-source base models: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3.
15
21
16
-
🔥 MFTCoder-accelerate supports Self-Paced Loss for Convergence Balance;
22
+
🔥 MFTCoder-accelerate supports Self-Paced Loss for Convergence Balance.
17
23
18
-
🔥 MFTCoder-accelerate supports Full-parameters/QLoRA/LoRA using accelerate + DeepSpeed Framework;
24
+
🔥 MFTCoder-accelerate supports Full-parameters/QLoRA/LoRA using accelerate + DeepSpeed Framework.
19
25
20
26
🔥 MFTCoder-accelerate supports Multitask Fine-Tuning(MFT), which is able to balance diffenrent tasks in data level.
21
27
22
28
🔥 MFTCoder-accelerate supports finetuning most of mainstream open-source base models: codellama, llama2, llama, starcoder, codegeex2, chatglm2, qwen.
23
29
24
30
##2. Data Format
25
-
###2.1 Training Data Format
31
+
###2.1MFTTraining Data Format
26
32
The training data is required to be a uniformed JSONL format, in which each line of data has the following "chatML"-style JSON format. The "chat_rounds" field is required, and other fields can be added or removed based on specific needs.
27
33
The reason why we selected "chatML" style as our training and inference data format is that "chatML" style is compatible with both "conversation" and "instruction/response" scenarios.
28
34
@@ -57,7 +63,7 @@ For the keys of roles in "chat_rounds", you could use "system/human/bot" tuple o
57
63
}
58
64
```
59
65
60
-
###2.2 Default InferenceData Format
66
+
###2.2 DefaultMFTCoderInferenceTemplate
61
67
Inference data format is the real string format consumed by tokenizers and then LLMs. It is also the string format to which the training data is converted before tokenization.
62
68
The default inference data format contains strings concatenated by conversation data(system, human and bot contents) in the training data format.
63
69
It is used as the data "seen"(before tokenization) by the model in training process.
@@ -87,6 +93,56 @@ User nth round input
87
93
```
88
94
When applying inference, you always make your input string end with```<s>bot\n``` to request the model generating answers.
89
95
96
+
###2.3 DPO训练数据格式
97
+
The training data is required to be a uniformed JSONL format, in which each line of data has the following JSON format. The "chosen" and "rejected" fields are required as```chosen``` and```rejected``` in DPO training and both includes "chatml-style" contents(only last content of bot differs).
98
+
```json
99
+
{
100
+
"chosen":[
101
+
{
102
+
"role":"system",
103
+
"content":"You are a expert in coding and help answer code questions"
104
+
},
105
+
{
106
+
"role":"human",
107
+
"content":"Write a python function of quick sort"
108
+
},
109
+
{
110
+
"role":"bot",
111
+
"content":"Below is the function of quick sort: ..."
112
+
},
113
+
{
114
+
"role":"human",
115
+
"content":"Explain the code"
116
+
},
117
+
{
118
+
"role":"bot",
119
+
"content":"OK, this code ..."
120
+
}
121
+
],
122
+
"rejected":[
123
+
{
124
+
"role":"system",
125
+
"content":"You are a expert in coding and help answer code questions"
126
+
},
127
+
{
128
+
"role":"human",
129
+
"content":"Write a python function of quick sort"
130
+
},
131
+
{
132
+
"role":"bot",
133
+
"content":"Below is the function of quick sort: ..."
You can find the implementations in the```mftcoder_accelerate/src``` directory.
135
-
The entry directory for fine-tuning training is```mftcoder_accelerate/src```, and the entry file for training is```mftcoder_accelerate/src/pefts/mft_accelerate.py```.
200
+
You can find the implementations in the```mftcoder_accelerate/src``` directory
201
+
The entry file for MFT training is```mftcoder_accelerate/src/pefts/mft_accelerate.py```.
202
+
203
+
The entry file for DPO/ORPO training is```mftcoder_accelerate/src/xxpo/xxpo_accelerate.py```.
204
+
205
+
The entry file for MPT(Continue Training) is```mftcoder_accelerate/src/mpt/mpt_accelerate.py```. You need finish offline tokenization of your data via```mftcoder_accelerate/src/run_offline_tokenization.sh```, which is different from the online tokenizaion used in MFT/DPO.
206
+
136
207
Configurations are stored in the```mftcoder_accelerate/src/configs``` directory for easy management and modification.
137
208
138
209
**_As a result, before you start training, you should first change your dir by_**
139
210
```
140
211
cd mftcoder_accelerate/src
141
212
```
142
213
143
-
###3.1 Tokenization
214
+
###3.1MFTTokenization
144
215
During training, we concatenate multi-turn dialogues into the following format (also known as the inference data format mentioned before) and then tokenize it.
145
216
146
217
In default format,```<s>human\n``` starts the user's input (i.e., prompt),```<s>bot\n``` starts the assistant's output (i.e., response)
@@ -271,6 +342,17 @@ Frequently used arguments are provided in ```configs/***_train_config``` and exp
271
342
272
343
-**role_markers**: {"system": "\<s\>system\n", "user": "\<s\>human\n", "assistant": "\<s\>bot\n} as default(null). You could set your preferred role_markers as the templates startting "system", "user" and "assistant". e.g. {"system": "### System:\n", "user": "### Instruction:\n", "assistant": "### Response:\n"}
273
344
345
+
####CoBa Arguments Configuration
346
+
-**coba_warmup_steps**: The number of warm-up steps for CoBa. During the warm-up period, all task weights are equal, and after the warm-up, weights begin to be adjusted dynamically. It is generally recommended to set this close to the total number of validation batches.
347
+
-**coba_history_length**: The historical window length of validation loss maintained by CoBa, used to fit the convergence slope at the current step. It is generally recommended to set this between 2 times and 5 times the**coba_warmup_steps**. Typically, the larger this value, the smaller the changes in weights will be.
348
+
-**coba_tau**: The temperature coefficient for the Divergence Factor (DF). It is generally set to 5.
349
+
-**coba_update_interval**: The frequency at which CoBa updates weights. It is commonly set to 1, meaning weights are updated at every step.
350
+
-**coba_sample_valid_num**: The number of validation batches to be sampled by CoBa at each step. Theoretically, when this value equals the total number of validation batches, the fitted convergence slope most closely approximates the actual situation. However, considering computational requirements, it is recommended to set it to 1.
351
+
352
+
####DPO Arguments Configuration
353
+
-**xxpo**: preference optimization type, "dpo" or "orpo".
354
+
-**beta**: DPO beta, smaller beta allows larger distance between dpo model and ref model.
355
+
-**rpo_alpha**: The coefficient of the```chosen``` NLL loss added to dpo loss.