NotificationsYou must be signed in to change notification settings
Fork66
Star694

Commit8ba13f4

authored

Merge pull request#84 from codefuse-ai/0.5.dev

0.5.dev pr

2 parentsbce9d43 +ae61b4d commit8ba13f4Copy full SHA for 8ba13f4

File tree

38 files changed

+6095

-795

lines changed

README.md
README_cn.md
mftcoder_accelerate
- README.md
- README_cn.md
- inference
  - hf_inference.py
- src
  - configs
    - coba_train_config.json
    - dpo_train_config.json
  - data
  - model/deepseek_v2
  - mpt
  - offline_tokenization
  - pefts
  - run_offline_tokenization.sh
  - tokenizer
    - __init__.py
    - tokenizer.py
  - utils
    - loss_utils.py
    - model_mapping.py
  - xxpo
requirements.txt

38 files changed

+6095

-795

lines changed

`‎README.md`

Lines changed: 4 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -46,6 +46,10 @@`
`46`	`46`
`47`	`47`
`48`	`48`	`##News`
	`49`	`+🔥🔥🔥[2024/10/31] We releasedMFTCoder v0.5 mainly for MFTCoder-accelerate, which is now supporting preference alignment methods likeDPO/RPO/ORPO in the newxxpo module, adding full-parameter continue-training in the additionalmpt module along with itsoffline_tokenization module, updating selfpaced method to new convergence balance(CoBa) method for MFT in the originalpefts module.`
	`50`	`+`
	`51`	`+🔥🔥🔥[2024/10/31] Our paper[CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) has been accepted by EMNLP-2024, which achieves balanced convergence across various tasks.`
	`52`	`+`
`49`	`53`	`🔥🔥🔥[2024/05/20] We releasedMFTCoder v0.4, mainly for MFTCoder-accelerate. It supportsQLoRA + DeepSpeed Zero3 andQLoRA + FSDP as options allowing you training very large models. It now supports new models like Qwen2, Qwen2-MoE, Starcoder2, Gemma, etc.`
`50`	`54`
`51`	`55`	`🔥🔥🔥[2024/05/20] Our paper[MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning](https://arxiv.org/abs/2311.02303) has been accepted by KDD2024.`

`‎README_cn.md`

Lines changed: 4 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -45,6 +45,10 @@`
`45`	`45`
`46`	`46`
`47`	`47`	`##新闻`
	`48`	`+🔥🔥🔥[2024/10/31]MFTCoder-v0.5发布，新增xxpo模块支持偏好对齐DPO/RPO/ORPO；新增mpt和offline_tokenization模块支持全量参数的加训；在原本的pefts模块（MFT）更新selfpaced收敛均衡技术并更名CoBa。`
	`49`	`+`
	`50`	`+🔥🔥🔥[2024/10/31] 我们的论文[CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) 已被 EMNLP 2024 接收，可以实现多任务收敛均衡。`
	`51`	`+`
`48`	`52`	`🔥🔥🔥[2024/05/20]MFTCoder-v0.4发布。新增支持QLoRA+ DeepSpeed Zero3,QLoRA + FSDP训练模式，可以更好的支持微调更大的模型，比如Qwen1.5-70B等。新增对Qwen2, Qwen2-MoE, Starcoder2, Gemma等模型的支持。`
`49`	`53`
`50`	`54`	`🔥🔥🔥[2024/05/20] 我们的论文[MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning](https://arxiv.org/abs/2311.02303) 已被 KDD 2024 接收.`

`‎mftcoder_accelerate/README.md`

Lines changed: 92 additions & 10 deletions

Original file line number	Diff line number	Diff line change
`@@ -7,22 +7,28 @@`
`7`	`7`	`[[中文]](README_cn.md)[English]`
`8`	`8`
`9`	`9`	`##1. Updates`
	`10`	`+🔥 MFTCoder-accelerate now supports DPO/ORPO training through xxpo module.`
	`11`	`+`
	`12`	`+🔥 MFTCoder-accelerate now supports continue training through mpt module along with offline_tokenization module.`
	`13`	`+`
	`14`	`+🔥 MFTCoder-accelerate supports MFT with latest implementation of CoBa Loss (selfpaced Loss) for better Convergence Balance.`
	`15`	`+`
`10`	`16`	`🔥 MFTCoder-accelerate now support these modes: QLoRA/LoRA + DeepSpeed ZeRO2， QLoRA + DeepSpeed ZeRO3, Full-parameter + DeepSpeed ZeRO3, QLoRA + FSDP, Full-parameter + FSDP.`
`11`	`17`
`12`		`-🔥 MFTCoder-accelerate supports QLoRA + DeepSpeed ZeRO3 and QLoRA + FSDP, which both work for larger models;`
	`18`	`+🔥 MFTCoder-accelerate supports QLoRA + DeepSpeed ZeRO3 and QLoRA + FSDP, which both work for larger models.`
`13`	`19`
`14`		`-🔥 MFTCoder-accelerate supports MFT/SFT on more new mainstream open-source base models: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3;`
	`20`	`+🔥 MFTCoder-accelerate supports MFT/SFT on more new mainstream open-source base models: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3.`
`15`	`21`
`16`		`-🔥 MFTCoder-accelerate supports Self-Paced Loss for Convergence Balance;`
	`22`	`+🔥 MFTCoder-accelerate supports Self-Paced Loss for Convergence Balance.`
`17`	`23`
`18`		`-🔥 MFTCoder-accelerate supports Full-parameters/QLoRA/LoRA using accelerate + DeepSpeed Framework;`
	`24`	`+🔥 MFTCoder-accelerate supports Full-parameters/QLoRA/LoRA using accelerate + DeepSpeed Framework.`
`19`	`25`
`20`	`26`	`🔥 MFTCoder-accelerate supports Multitask Fine-Tuning(MFT), which is able to balance diffenrent tasks in data level.`
`21`	`27`
`22`	`28`	`🔥 MFTCoder-accelerate supports finetuning most of mainstream open-source base models: codellama, llama2, llama, starcoder, codegeex2, chatglm2, qwen.`
`23`	`29`
`24`	`30`	`##2. Data Format`
`25`		`-###2.1 Training Data Format`
	`31`	`+###2.1MFTTraining Data Format`
`26`	`32`	`The training data is required to be a uniformed JSONL format, in which each line of data has the following "chatML"-style JSON format. The "chat_rounds" field is required, and other fields can be added or removed based on specific needs.`
`27`	`33`	`The reason why we selected "chatML" style as our training and inference data format is that "chatML" style is compatible with both "conversation" and "instruction/response" scenarios.`
`28`	`34`
`@@ -57,7 +63,7 @@ For the keys of roles in "chat_rounds", you could use "system/human/bot" tuple o`
`57`	`63`	`}`
`58`	`64`	```
`59`	`65`
`60`		`-###2.2 Default InferenceData Format`
	`66`	`+###2.2 DefaultMFTCoderInferenceTemplate`
`61`	`67`	`Inference data format is the real string format consumed by tokenizers and then LLMs. It is also the string format to which the training data is converted before tokenization.`
`62`	`68`	`The default inference data format contains strings concatenated by conversation data(system, human and bot contents) in the training data format.`
`63`	`69`	`It is used as the data "seen"(before tokenization) by the model in training process.`
`@@ -87,6 +93,56 @@ User nth round input`
`87`	`93`	```
`88`	`94`	When applying inference, you always make your input string end with```<s>bot\n``` to request the model generating answers.
`89`	`95`
	`96`	`+###2.3 DPO训练数据格式`
	`97`	+The training data is required to be a uniformed JSONL format, in which each line of data has the following JSON format. The "chosen" and "rejected" fields are required as```chosen``` and```rejected``` in DPO training and both includes "chatml-style" contents(only last content of bot differs).
	`98`	+```json
	`99`	`+{`
	`100`	`+"chosen":[`
	`101`	`+ {`
	`102`	`+"role":"system",`
	`103`	`+"content":"You are a expert in coding and help answer code questions"`
	`104`	`+ },`
	`105`	`+ {`
	`106`	`+"role":"human",`
	`107`	`+"content":"Write a python function of quick sort"`
	`108`	`+ },`
	`109`	`+ {`
	`110`	`+"role":"bot",`
	`111`	`+"content":"Below is the function of quick sort: ..."`
	`112`	`+ },`
	`113`	`+ {`
	`114`	`+"role":"human",`
	`115`	`+"content":"Explain the code"`
	`116`	`+ },`
	`117`	`+ {`
	`118`	`+"role":"bot",`
	`119`	`+"content":"OK, this code ..."`
	`120`	`+ }`
	`121`	`+ ],`
	`122`	`+"rejected":[`
	`123`	`+ {`
	`124`	`+"role":"system",`
	`125`	`+"content":"You are a expert in coding and help answer code questions"`
	`126`	`+ },`
	`127`	`+ {`
	`128`	`+"role":"human",`
	`129`	`+"content":"Write a python function of quick sort"`
	`130`	`+ },`
	`131`	`+ {`
	`132`	`+"role":"bot",`
	`133`	`+"content":"Below is the function of quick sort: ..."`
	`134`	`+ },`
	`135`	`+ {`
	`136`	`+"role":"human",`
	`137`	`+"content":"Explain the code"`
	`138`	`+ },`
	`139`	`+ {`
	`140`	`+"role":"bot",`
	`141`	`+"content":"Sorry, I can not answer..."`
	`142`	`+ }`
	`143`	`+ ]`
	`144`	`+}`
	`145`	+```
`90`	`146`
`91`	`147`
`92`	`148`	`##3. Model Training`
`@@ -114,6 +170,12 @@ mftcoder_accelerate`
`114`	`170`	`\|`
`115`	`171`	`pefts`
`116`	`172`	`\|`
	`173`	`+ xxpo`
	`174`	`+ \|`
	`175`	`+ mpt`
	`176`	`+ \|`
	`177`	`+ offline_tokenization`
	`178`	`+ \|`
`117`	`179`	`tokenizer`
`118`	`180`	`\|`
`119`	`181`	`utils`
`@@ -122,7 +184,11 @@ mftcoder_accelerate`
`122`	`184`	```
`123`	`185`	我们将训练中使用的各种组件抽取出来，以便后续的扩展和优化，详见```src```目录下的实现。
`124`	`186`
`125`		-训练入口文件是```mftcoder_accelerate/src/pefts/mft_accelerate.py```
	`187`	+MFT训练入口文件是```mftcoder_accelerate/src/pefts/mft_accelerate.py```
	`188`	`+`
	`189`	+DPO/ORPO训练入口文件是```mftcoder_accelerate/src/xxpo/xxpo_accelerate.py```
	`190`	`+`
	`191`	+MPT(全量加训)训练入口文件是```mftcoder_accelerate/src/mpt/mpt_accelerate.py```
`126`	`192`
`127`	`193`	参数配置存储在```mftcoder_accelerate/src/configs```目录下，方便统一管理和更改。
`128`	`194`
`@@ -131,16 +197,21 @@ mftcoder_accelerate`
`131`	`197`	`cd mftcoder_accelerate/src`
`132`	`198`	```
`133`	`199`
`134`		-You can find the implementations in the```mftcoder_accelerate/src``` directory.
`135`		-The entry directory for fine-tuning training is```mftcoder_accelerate/src```, and the entry file for training is```mftcoder_accelerate/src/pefts/mft_accelerate.py```.
	`200`	+You can find the implementations in the```mftcoder_accelerate/src``` directory
	`201`	+The entry file for MFT training is```mftcoder_accelerate/src/pefts/mft_accelerate.py```.
	`202`	`+`
	`203`	+The entry file for DPO/ORPO training is```mftcoder_accelerate/src/xxpo/xxpo_accelerate.py```.
	`204`	`+`
	`205`	+The entry file for MPT(Continue Training) is```mftcoder_accelerate/src/mpt/mpt_accelerate.py```. You need finish offline tokenization of your data via```mftcoder_accelerate/src/run_offline_tokenization.sh```, which is different from the online tokenizaion used in MFT/DPO.
	`206`	`+`
`136`	`207`	Configurations are stored in the```mftcoder_accelerate/src/configs``` directory for easy management and modification.
`137`	`208`
`138`	`209`	`_As a result, before you start training, you should first change your dir by_`
`139`	`210`	```
`140`	`211`	`cd mftcoder_accelerate/src`
`141`	`212`	```
`142`	`213`
`143`		`-###3.1 Tokenization`
	`214`	`+###3.1MFTTokenization`
`144`	`215`	`During training, we concatenate multi-turn dialogues into the following format (also known as the inference data format mentioned before) and then tokenize it.`
`145`	`216`
`146`	`217`	In default format,```<s>human\n``` starts the user's input (i.e., prompt),```<s>bot\n``` starts the assistant's output (i.e., response)
@@ -271,6 +342,17 @@ Frequently used arguments are provided in ```configs/***_train_config``` and exp
`271`	`342`
`272`	`343`	`-role_markers: {"system": "\<s\>system\n", "user": "\<s\>human\n", "assistant": "\<s\>bot\n} as default(null). You could set your preferred role_markers as the templates startting "system", "user" and "assistant". e.g. {"system": "### System:\n", "user": "### Instruction:\n", "assistant": "### Response:\n"}`
`273`	`344`
	`345`	`+####CoBa Arguments Configuration`
	`346`	`+-coba_warmup_steps: The number of warm-up steps for CoBa. During the warm-up period, all task weights are equal, and after the warm-up, weights begin to be adjusted dynamically. It is generally recommended to set this close to the total number of validation batches.`
	`347`	`+-coba_history_length: The historical window length of validation loss maintained by CoBa, used to fit the convergence slope at the current step. It is generally recommended to set this between 2 times and 5 times thecoba_warmup_steps. Typically, the larger this value, the smaller the changes in weights will be.`
	`348`	`+-coba_tau: The temperature coefficient for the Divergence Factor (DF). It is generally set to 5.`
	`349`	`+-coba_update_interval: The frequency at which CoBa updates weights. It is commonly set to 1, meaning weights are updated at every step.`
	`350`	`+-coba_sample_valid_num: The number of validation batches to be sampled by CoBa at each step. Theoretically, when this value equals the total number of validation batches, the fitted convergence slope most closely approximates the actual situation. However, considering computational requirements, it is recommended to set it to 1.`
	`351`	`+`
	`352`	`+####DPO Arguments Configuration`
	`353`	`+-xxpo: preference optimization type, "dpo" or "orpo".`
	`354`	`+-beta: DPO beta, smaller beta allows larger distance between dpo model and ref model.`
	`355`	+-rpo_alpha: The coefficient of the```chosen``` NLL loss added to dpo loss.
`274`	`356`
`275`	`357`	`##4. Model Usage`
`276`	`358`

`‎mftcoder_accelerate/README_cn.md`

Lines changed: 89 additions & 10 deletions

Original file line number	Diff line number	Diff line change
`@@ -7,24 +7,30 @@`
`7`	`7`	`[中文][[English]](README.md)`
`8`	`8`
`9`	`9`	`##1. 更新`
	`10`	`+🔥 MFTCoder-accelerate 增加了xxpo模块，支持dpo训练。`
	`11`	`+`
	`12`	`+🔥 MFTCoder-accelerate 增加了mpt模块，借助offline_tokenization模块，支持全量参数加训。`
	`13`	`+`
	`14`	`+🔥 MFTCoder-accelerate 增加了CoBa Loss的最新实现（原selfpaced Loss）, 让收敛均衡更进一步。`
	`15`	`+`
`10`	`16`	`🔥 MFTCoder-accelerate 最新支持的训练模式包括: QLoRA/LoRA + DeepSpeed ZeRO2， QLoRA + DeepSpeed ZeRO3, 全量 + DeepSpeed ZeRO3, QLoRA + FSDP, 全量 + FSDP。`
`11`	`17`
`12`		`-🔥 MFTCoder-accelerate 新增支持QLoRA + DeepSpeed ZeRO3，支持QLoRA + FSDP, 可以训练更大的模型;`
	`18`	`+🔥 MFTCoder-accelerate 新增支持QLoRA + DeepSpeed ZeRO3，支持QLoRA + FSDP, 可以训练更大的模型。`
`13`	`19`
`14`		`-🔥 MFTCoder-accelerate 新增支持accelerate + FSDP框架，支持全量微调和LoRA;`
	`20`	`+🔥 MFTCoder-accelerate 新增支持accelerate + FSDP框架，支持全量微调和LoRA。`
`15`	`21`
`16`		`-🔥 MFTCoder-accelerate 支持最新更多主流开源模型: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3；`
	`22`	`+🔥 MFTCoder-accelerate 支持最新更多主流开源模型: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3。`
`17`	`23`
`18`		`-🔥 MFTCoder-accelerate 新增self-paced Loss, 用于收敛均衡；`
	`24`	`+🔥 MFTCoder-accelerate 新增self-paced Loss, 用于收敛均衡。`
`19`	`25`
`20`		`-🔥 MFTCoder-accelerate 支持使用accelerate + DeepSpeed框架下支持全量参数/QLoRA/LoRA微调；`
	`26`	`+🔥 MFTCoder-accelerate 支持使用accelerate + DeepSpeed框架下支持全量参数/QLoRA/LoRA微调。`
`21`	`27`
`22`		`-🔥 MFTCoder-accelerate 在训练中支持了多任务微调MFT，可以同时平衡多个任务的训练，训练的模型支持多任务推理；`
	`28`	`+🔥 MFTCoder-accelerate 在训练中支持了多任务微调MFT，可以同时平衡多个任务的训练，训练的模型支持多任务推理。`
`23`	`29`
`24`		`-🔥 MFTCoder-accelerate 在训练中支持多种模型基座： codellama, llama2, llama, starcoder, codegeex2, chatglm2, qwen等`
	`30`	`+🔥 MFTCoder-accelerate 在训练中支持多种模型基座： codellama, llama2, llama, starcoder, codegeex2, chatglm2, qwen等。`
`25`	`31`
`26`	`32`	`##2. 数据格式`
`27`		`-###2.1训练数据格式`
	`33`	`+###2.1MFT训练数据格式`
`28`	`34`	`训练数据为jsonl格式，每一行的数据格式如下，其中chat_rounds字段是必需的，可以根据实际需求添加或删除其他字段。`
`29`	`35`	`可以参考项目中的xxx.jsonl文件。`
`30`	`36`	```json
`@@ -80,6 +86,57 @@`
`80`	`86`	`"""`
`81`	`87`	```
`82`	`88`
	`89`	`+###2.3 DPO训练数据格式`
	`90`	+训练数据为jsonl格式，每一行的数据格式如下，其中chosen字段和rejected字段分别代表偏好对齐中的```chosen```和```rejected```，其内部依然是MFT的chatml格式，并且只有最后一轮对话的bot content不同。
	`91`	+```json
	`92`	`+{`
	`93`	`+"chosen":[`
	`94`	`+ {`
	`95`	`+"role":"system",`
	`96`	`+"content":"你是一个智能代码助手，可以回复用户与代码相关的问题"`
	`97`	`+ },`
	`98`	`+ {`
	`99`	`+"role":"human",`
	`100`	`+"content":"写一个快速排序"`
	`101`	`+ },`
	`102`	`+ {`
	`103`	`+"role":"bot",`
	`104`	`+"content":"以下是一个快速排序算法xxxxxx"`
	`105`	`+ },`
	`106`	`+ {`
	`107`	`+"role":"human",`
	`108`	`+"content":"解释一下这段代码"`
	`109`	`+ },`
	`110`	`+ {`
	`111`	`+"role":"bot",`
	`112`	`+"content":"好的，这段代码xxx"`
	`113`	`+ }`
	`114`	`+ ],`
	`115`	`+"rejected":[`
	`116`	`+ {`
	`117`	`+"role":"system",`
	`118`	`+"content":"你是一个智能代码助手，可以回复用户与代码相关的问题"`
	`119`	`+ },`
	`120`	`+ {`
	`121`	`+"role":"human",`
	`122`	`+"content":"写一个快速排序"`
	`123`	`+ },`
	`124`	`+ {`
	`125`	`+"role":"bot",`
	`126`	`+"content":"以下是一个快速排序算法xxxxxx"`
	`127`	`+ },`
	`128`	`+ {`
	`129`	`+"role":"human",`
	`130`	`+"content":"解释一下这段代码"`
	`131`	`+ },`
	`132`	`+ {`
	`133`	`+"role":"bot",`
	`134`	`+"content":"对不起，我不会"`
	`135`	`+ }`
	`136`	`+ ]`
	`137`	`+}`
	`138`	+```
	`139`	`+`
`83`	`140`
`84`	`141`	`##3. 模型训练`
`85`	`142`	`目前支持全量参数(Full-parameters)指令微调、QLoRA指令微调，LoRA指令微调。`
`@@ -104,6 +161,12 @@ mftcoder_accelerate`
`104`	`161`	`\|`
`105`	`162`	`pefts`
`106`	`163`	`\|`
	`164`	`+ xxpo`
	`165`	`+ \|`
	`166`	`+ mpt`
	`167`	`+ \|`
	`168`	`+ offline_tokenization`
	`169`	`+ \|`
`107`	`170`	`tokenizer`
`108`	`171`	`\|`
`109`	`172`	`utils`
`@@ -112,7 +175,11 @@ mftcoder_accelerate`
`112`	`175`	```
`113`	`176`	我们将训练中使用的各种组件抽取出来，以便后续的扩展和优化，详见```src```目录下的实现。
`114`	`177`
`115`		-训练入口文件是```mftcoder_accelerate/src/pefts/mft_accelerate.py```
	`178`	+MFT训练入口文件是```mftcoder_accelerate/src/pefts/mft_accelerate.py```
	`179`	`+`
	`180`	+DPO/ORPO训练入口文件是```mftcoder_accelerate/src/xxpo/xxpo_accelerate.py```
	`181`	`+`
	`182`	+MPT(全量加训)训练入口文件是```mftcoder_accelerate/src/mpt/mpt_accelerate.py```. MPT加训需要提前做好数据的tokenziation，通过```mftcoder_accelerate/src/run_offline_tokenization.sh```，你可以将数据通过cpu进行离线的tokenization。这和MFT/DPO中使用的在线tokenziation不同。
`116`	`183`
`117`	`184`	参数配置存储在```mftcoder_accelerate/src/configs```目录下，方便统一管理和更改。
`118`	`185`
`@@ -124,7 +191,7 @@ cd mftcoder_accelerate/src`
`124`	`191`
`125`	`192`
`126`	`193`	`###3.1 数据tokenization`
`127`		`-训练时，我们将多轮对话拼接成如下格式（也是上文中的推理数据格式），然后进行tokenize。`
	`194`	`+MFT/DPO训练时，我们将多轮对话拼接成如下格式（也是上文中的推理数据格式），然后进行tokenize。`
`128`	`195`	`其中，默认情况下：`
`129`	`196`
`130`	`197`	```<s>human\n```作为human/user的起始符，```<s>bot\n```作为bot/assistant的起始符，```{EOS_TOKEN}``` 表示eos_token。
@@ -217,6 +284,18 @@ _*训练需要的参数配置在```configs/_train_config```中，主要参数
`217`	`284`	`-saving_limit：整数，ckpt存储数量上限，全量训练必须设置。默认null即不限制数量。`
`218`	`285`	`-role_markers: null，即使用{"system": "\<s\>system\n", "user": "\<s\>human\n", "assistant": "\<s\>bot\n"}。你可以自定义 "system", "user" and "assistant"的模板，用于定制自己的问答或者对话模板，比如 {"system": "### System:\n", "user": "### Instruction:\n", "assistant": "### Response:\n"}`
`219`	`286`
	`287`	`+####CoBa相关参数配置`
	`288`	`+-coba_warmup_steps: CoBa的warm-up步数。在warm-up期间，各任务权重相等，warm-up之后，开始动态调整权重。一般建议设置为与valid batch总数量相近即可。`
	`289`	`+-coba_history_length: CoBa维护的valid loss的历史窗口长度，用于拟合当前步收敛斜率。一般建议设置为2倍coba_warmup_steps至5倍coba_warmup_steps之间。一般该值越大，权重的变化幅度就会越小。`
	`290`	`+-coba_tau: 发散因子（DF）的温度系数。一般设置为5即可。`
	`291`	`+-coba_update_interval: CoBa更新权重的频率。一般设置为1，即每一步都对权重做更新。`
	`292`	`+-coba_sample_valid_num: CoBa每一步要取的valid batch数。理论上当该值等于valid batch总数量时，拟合出的收敛斜率最逼近真实情况，但考虑到计算需求，建议设置为1。`
	`293`	`+`
	`294`	`+####DPO 相关参数配置`
	`295`	`+-xxpo: 偏好对齐方法, "dpo" 或者 "orpo"。`
	`296`	`+-beta: DPO beta, beta 越小，允许对齐后的dpo模型与ref模型的距离越远。`
	`297`	+-rpo_alpha: 加到dop损失的```chosen``` NLL损失的系数，0的话就是原始DPO。
	`298`	`+-`
`220`	`299`	`##4. 模型使用`
`221`	`300`
`222`	`301`	`###4.1 权重合并`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit8ba13f4

File tree

38 files changed

38 files changed

`‎README.md`

`‎README_cn.md`

`‎mftcoder_accelerate/README.md`

`‎mftcoder_accelerate/README_cn.md`

0 commit comments