Phi 3#

Microsoft’s Phi-3-mini-4K-Instruct is a 3.8B parameters, lightweight state of the art open trained model The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can supportWe provide pre-defined recipes for pretraining and finetuning a Llama 3 model in two sizes: 8B and 70B,as well as Llama 3.1 model in three sizes: 8B, 70B and 405B.The recipes use NeMo 2.0 andNeMo-Run.These recipes configure arun.Partial for one of thenemo.collections.llm api functions introduced in NeMo 2.0.The recipes are hosted in the following files:llama3_8b,llama3_70b,llama31_8b,llama31_70b,llama31_405b.

NeMo 2.0 Pretraining Recipes#

Note

The pretraining recipes use theMockDataModule for thedata argument. You are expected to replace theMockDataModule with your custom dataset.

We provide an example below on how to invoke the default recipe and override the data argument:

fromnemo.collectionsimportllmpretrain=llm.phi3_mini_4k_instruct.pretrain_recipe(name="phi3_mini_4k_instruct_pretraining",dir=f"/path/to/checkpoints",num_nodes=1,num_gpus_per_node=8,)# # To override the data argument# dataloader = a_function_that_configures_your_custom_dataset(#     gbs=gbs,#     mbs=mbs,#     seq_length=pretrain.model.config.seq_length,# )# pretrain.data = dataloader

NeMo 2.0 Finetuning Recipes#

Note

The finetuning recipes use theSquadDataModule for thedata argument. You can replace theSquadDataModule with your custom dataset.

To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once)

frompathlibimportPathfromnemo.collections.llmimportimport_ckptfromnemo.collections.llm.gpt.model.phi3miniimportPhi3ConfigMini,Phi3Modelif__name__=="__main__":import_ckpt(model=Phi3Model(Phi3ConfigMini()),source='hf://microsoft/Phi-3-mini-4k-instruct')

We provide an example below on how to invoke the default recipe and override the data argument:

fromnemo.collectionsimportllmrecipe=llm.phi3_mini_4k_instruct.pretrain_recipe(name="phi3_mini_4k_instruct_pretrainin",dir=f"/path/to/checkpoints",num_nodes=1,num_gpus_per_node=1,peft_scheme='lora',# 'lora', 'none'packed_sequence=None,)# # To override the data argument# dataloader = a_function_that_configures_your_custom_dataset(#     gbs=gbs,#     mbs=mbs,#     seq_length=recipe.model.config.seq_length,# )# recipe.data = dataloader

By default, the finetuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model.To finetune the entire model without LoRA, setpeft_scheme='none' in the recipe argument.

To finetune with sequence packing for a higher throughput, setpacked_sequence=True. Note that you may need totune the global batch size in order to achieve similar convergence.

Note

The configuration in the recipes is done using the NeMo-Runrun.Config andrun.Partial configuration objects. Please review the NeMo-Rundocumentation to learn more about its configuration and execution system.

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows:

importnemo_runasrunrun.run(pretrain,executor=run.LocalExecutor())

Additionally, you can also run it directly in the same Python process as follows:

run.run(pretrain,direct=True)

A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference:

Recipe

Status

Phi 3 mini 4k instruct

Yes

Phi 3 mini 128k instruct

N/A

Phi 3 small 8k instruct

N/A