arcee-ai/mergekitPublic

NotificationsYou must be signed in to change notification settings
Fork579
Star6k

Tools for merging pretrained large language models.

License

View license

6k stars 579 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
mergekit		mergekit
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLA.md		CLA.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
notebook.ipynb		notebook.ipynb
pyproject.toml		pyproject.toml

Repository files navigation

mergekit

mergekit is a toolkit for merging pre-trained language models.mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

Why Merge Models?

Model merging is a powerful technique that allows combining the strengths of different models without the computational overhead of ensembling or the need for additional training. By operating directly in the weight space of models, merging can:

Combine multiple specialized models into a single versatile model
Transfer capabilities between models without access to training data
Find optimal trade-offs between different model behaviors
Improve performance while maintaining inference costs
Create new capabilities through creative model combinations

Unlike traditional ensembling which requires running multiple models, merged models maintain the same inference cost as a single model while often achieving comparable or superior performance.

Features

Key features ofmergekit include:

Supports Llama, Mistral, GPT-NeoX, StableLM, and more
Manymerge methods
GPU or CPU execution
Lazy loading of tensors for low memory use
Interpolated gradients for parameter values (inspired by Gryphe'sBlockMerge_Gradient script)
Piecewise assembly of language models from layers ("Frankenmerging")
Mixture of Experts merging
LORA extraction
Evolutionary merge methods
Multi-stage merging for complex workflows.
Merging of raw PyTorch models (mergekit-pytorch).

🌐 GUI Launch Alert 🤗 - We are excited to announce the launch of a mega-GPU backed graphical user interface for mergekit in Arcee! This GUI simplifies the merging process, making it more accessible to a broader audience. Check it out and contribute at theArcee App. There is also aHugging Face Space with limited amounts of GPUs.

Installation

git clone https://github.com/arcee-ai/mergekit.gitcd mergekitpip install -e.# install the package and make scripts available

If the above fails with the error of:

ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:(A "pyproject.toml" file was found, but editable mode currently requires a setuptools-based build.)

You may need to upgrade pip to > 21.3 with the commandpython3 -m pip install --upgrade pip

Contributing

We welcome contributions tomergekit! If you have ideas for new merge methods, features, or other improvements, please check out ourcontributing guide for details on how to get started.

Usage

The scriptmergekit-yaml is the main entry point formergekit. It takes a YAML configuration file and an output path, like so:

mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]

This will run the merge and write your merged model to./output-model-directory.

For more information on the arguments accepted bymergekit-yaml run the commandmergekit-yaml --help.

Uploading to Huggingface

When you have a merged model you're happy with, you may want to share it on the Hugging Face Hub.mergekit generates aREADME.md for your merge with some basic information for a model card. You can edit it to include more details about your merge, like giving it a good name or explaining what it's good at; rewrite it entirely; or use the generatedREADME.md as-is. It is also possible to edit yourREADME.md online once it has been uploaded to the Hub.

Once you're happy with your model card and merged model, you can upload it to the Hugging Face Hub using thehuggingface_hub Python library.

# log in to huggingface with an access token (must have write permission)huggingface-cli login# upload your modelhuggingface-cli upload your_hf_username/my-cool-model ./output-model-directory.

Thedocumentation forhuggingface_hub goes into more detail about other options for uploading.

Merge Configuration

Merge configurations are YAML documents specifying the operations to perform in order to produce your merged model.Below are the primary elements of a configuration file:

merge_method: Specifies the method to use for merging models. SeeMerge Methods for a list.
slices: Defines slices of layers from different models to be used. This field is mutually exclusive withmodels.
models: Defines entire models to be used for merging. This field is mutually exclusive withslices.
base_model: Specifies the base model used in some merging methods.
parameters: Holds various parameters such as weights and densities, which can also be specified at different levels of the configuration.
dtype: Specifies the data type used for the merging operation.
tokenizer ortokenizer_source: Determines how to construct a tokenizer for the merged model.
chat_template: Specifies a chat template for the merged model.

Parameter Specification

Parameters are flexible and can be set with varying precedence. They can be specified conditionally using tensor name filters, which allows finer control such as differentiating between attention heads and fully connected layers.

Parameters can be specified as:

Scalars: Single floating-point values.
Gradients: List of floating-point values, specifying an interpolated gradient.

The parameters can be set at different levels, with decreasing precedence as follows:

slices.*.sources.parameters - applying to a specific input slice
slices.*.parameters - applying to a specific output slice
models.*.parameters orinput_model_parameters - applying to any tensors coming from specific input models
parameters - catchall

Tokenizer Configuration

The tokenizer behavior can be configured in two ways: using the newtokenizer field (recommended) or the legacytokenizer_source field (maintained for backward compatibility). These fields are mutually exclusive - you should use one or the other, not both.

Modern Configuration (tokenizer)

Thetokenizer field provides fine-grained control over vocabulary and embeddings:

tokenizer:source:"union"# or "base" or a specific model pathtokens:# Optional: configure specific tokens<token_name>:source:...# Specify embedding sourceforce:false# Optional: force this embedding for all modelspad_to_multiple_of:null# Optional: pad vocabulary size

Tokenizer Source

Thesource field determines the vocabulary of the output model:

union: Combine vocabularies from all input models (default)
base: Use vocabulary from the base model
"path/to/model": Use vocabulary from a specific model

Token Embedding Handling

When merging models with different vocabularies, mergekit uses smart defaults to handle token embeddings:

If a token exists in the base model, its embedding is used as the default
If only one model has the token, that model's embedding is used
Otherwise, an average of all available embeddings is used

You can override these defaults for specific tokens:

tokenizer:source:uniontokens:# Use embedding from a specific model<|im_start|>:source:"path/to/chatml/model"# Force a specific embedding for all models<|special|>:source:"path/to/model"force:true# Map a token to another model's token embedding<|renamed_token|>:source:kind:"model_token"model:"path/to/model"token:"<|original_token|>"# or use token_id: 1234

Practical Example

Here's how you might preserve both Llama 3 Instruct and ChatML prompt formats when merging models:

tokenizer:source:uniontokens:# ChatML tokens<|im_start|>:source:"chatml_model"<|im_end|>:source:"chatml_model"# Llama 3 tokens - force original embeddings<|start_header_id|>:source:"llama3_model"force:true<|end_header_id|>:source:"llama3_model"force:true<|eot_id|>:source:"llama3_model"force:true

Legacy Configuration (tokenizer_source)

For backward compatibility, thetokenizer_source field is still supported:

tokenizer_source:"union"# or "base" or a model path

This provides basic tokenizer selection but lacks the fine-grained control of the moderntokenizer field.

Chat Template Configuration

The optionalchat_template field allows overriding the chat template used for the merged model.

chat_template:"auto"# or a template name or Jinja2 template

Options include:

"auto": Automatically select the most common template among input models
Built-in templates:"alpaca","chatml","llama3","mistral","exaone"
A Jinja2 template string for custom formatting

Examples

Several examples of merge configurations are available inexamples/.

Merge Methods

mergekit offers many methods for merging models, each with its own strengths and weaknesses. Choosing the right method depends on your specific goals, the relationship between the models you're merging, and the desired characteristics of the final model.

For detailed explanations, parameter descriptions, and use cases for each method, please see ourMerge Method Guide.

Method Overview

Method (`value`)	Core Idea	# Models	Base Model	Key Strengths / Use Cases
Linear (`linear`)	Simple weighted average of model parameters.	≥2	-	Averaging similar checkpoints, model soups.
SLERP (`slerp`)	Spherical linear interpolation between two models.	2	✓	Smoothly transitioning between two models.
NuSLERP (`nuslerp`)	Enhanced SLERP with flexible weighting.	2	*	More intuitive SLERP; task vector SLERP.
Multi-SLERP (`multislerp`)	Barycentric SLERP for multiple models.	≥2	*	Spherical interpolation for >2 models.
Karcher Mean (`karcher`)	Riemannian barycenter of model parameters.	≥2	-	Geometrically sound averaging on manifolds.
Task Arithmetic (`task_arithmetic`)	Linearly combine "task vectors" (differences from a base).	≥2	✓	Transferring/combining fine-tuned skills.
TIES (`ties`)	Task arithmetic + sparsification & sign consensus.	≥2	✓	Merging many models, reducing interference.
DARE (`dare_linear`,`dare_ties`)	Task arithmetic + random pruning & rescaling.	≥2	✓	Robust skill retention, similar to TIES.
DELLA (`della`,`della_linear`)	Task arithmetic + adaptive magnitude-based pruning.	≥2	✓	Prioritizing important changes, reducing interference.
Model Breadcrumbs (`breadcrumbs`,`breadcrumbs_ties`)	Task arithmetic + outlier removal (small & large diffs).	≥2	✓	Refining task vectors by removing extreme changes.
SCE (`sce`)	Task arithmetic + adaptive matrix-level weighting based on variance.	≥2	✓	Dynamically weighting models based on parameter variance.
Model Stock (`model_stock`)	Geometric weight calculation for linear interpolation.	≥3	✓	Finding good linear interpolation weights for many checkpoints.
Nearswap (`nearswap`)	Interpolate where parameters are similar.	2	✓	Selective merging based on parameter similarity.
Arcee Fusion (`arcee_fusion`)	Dynamic thresholding for fusing important changes.	2	✓	Identifying and merging salient features.
Passthrough (`passthrough`)	Directly copies tensors from a single input model.	1	-	Frankenmerging, layer stacking, model surgery.

Key forBase Model Column:

✓:Required - One of the input modelsmust be designated as thebase_model.
*:Optional - One of the input modelscan be designated as thebase_model.
-:Not Applicable -base_model has no effect on this method.

LoRA Extraction

Mergekit allows extracting PEFT-compatible low-rank approximations of finetuned models.

Usage

mergekit-extract-lora --model finetuned_model_id_or_path --base-model base_model_id_or_path --out-path output_path [--no-lazy-unpickle] [--cuda] [--max-rank=desired_rank] [--sv-epsilon=tol]

Mixture of Experts Merging

Themergekit-moe script supports merging multiple dense models into a mixture of experts, either for direct use or for further training. For more details see themergekit-moe documentation.

Evolutionary Merge Methods

Seedocs/evolve.md for details.

Multi-Stage Merging (`mergekit-multi`)

mergekit-multi enables the execution of complex, multi-stage model merging workflows. You can define multiple merge configurations in a single YAML file, where later merges can use the outputs of earlier ones as inputs. This is useful for building up sophisticated models through a series of targeted merges.

See themergekit-multi documentation for usage details and examples.

Raw PyTorch Model Merging (`mergekit-pytorch`)

For merging arbitrary PyTorch models (not necessarily Hugging Face Transformers),mergekit-pytorch provides a way to apply mergekit's algorithms directly to.pt or.safetensors checkpoints. The configuration is similar to the YAML format used inmergekit-yaml, but does not support layer slicing or tokenizer configuration.

Usage

mergekit-pytorch path/to/your/raw_config.yml ./output_pytorch_model_directory [options]

Usemergekit-pytorch --help for detailed options.

Tokenizer Transplantation (`mergekit-tokensurgeon`)

mergekit-tokensurgeon is a specialized tool for transplanting tokenizers between models, allowing you to align the vocabulary of one model with another. This is particularly useful for cheaply producing draft models for speculative decoding or for cross-tokenizer knowledge distillation. See thedocumentation for more details and how to use it.

✨ Merge in the Cloud ✨

We host merging on Arcee's cloud GPUs - you can launch a cloud merge in theArcee App. Or through python - grab an ARCEE_API_KEY:

export ARCEE_API_KEY=<your-api-key>pip install -q arcee-py

importarceearcee.merge_yaml("bio-merge","./examples/bio-merge.yml")

Check your merge status at theArcee App

When complete, either deploy your merge:

arcee.start_deployment("bio-merge",merging="bio-merge")

Or download your merge:

!arcee merging download bio-merge

Citation

If you findmergekit useful in your research, please consider citing thepaper:

@inproceedings{goddard-etal-2024-arcees,title ="Arcee{'}s {M}erge{K}it: A Toolkit for Merging Large Language Models",author ="Goddard, Charles  and      Siriwardhana, Shamane  and      Ehghaghi, Malikeh  and      Meyers, Luke  and      Karpukhin, Vladimir  and      Benedict, Brian  and      McQuade, Mark  and      Solawetz, Jacob",editor ="Dernoncourt, Franck  and      Preo{\c{t}}iuc-Pietro, Daniel  and      Shimorina, Anastasia",booktitle ="Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track",month = nov,year ="2024",address ="Miami, Florida, US",publisher ="Association for Computational Linguistics",url ="https://aclanthology.org/2024.emnlp-industry.36",doi ="10.18653/v1/2024.emnlp-industry.36",pages ="477--485",abstract ="The rapid growth of open-source language models provides the opportunity to merge model checkpoints, combining their parameters to improve performance and versatility. Advances in transfer learning have led to numerous task-specific models, which model merging can integrate into powerful multitask models without additional training. MergeKit is an open-source library designed to support this process with an efficient and extensible framework suitable for any hardware. It has facilitated the merging of thousands of models, contributing to some of the world{'}s most powerful open-source model checkpoints. The library is accessible at: https://github.com/arcee-ai/mergekit.",}

About

Tools for merging pretrained large language models.

Languages

Python99.0%
Other1.0%

Movatterモバイル変換

License

arcee-ai/mergekit

Folders and files

Latest commit

History

Repository files navigation

mergekit

Contents

Why Merge Models?

Features

Installation

Contributing

Usage

Uploading to Huggingface

Merge Configuration

Parameter Specification

Tokenizer Configuration

Modern Configuration (tokenizer)

Tokenizer Source

Token Embedding Handling

Practical Example

Legacy Configuration (tokenizer_source)

Chat Template Configuration

Examples

Merge Methods

Method Overview

LoRA Extraction

Usage

Mixture of Experts Merging

Evolutionary Merge Methods

Multi-Stage Merging (mergekit-multi)

Raw PyTorch Model Merging (mergekit-pytorch)

Usage

Tokenizer Transplantation (mergekit-tokensurgeon)

✨ Merge in the Cloud ✨

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors36

Languages

Multi-Stage Merging (`mergekit-multi`)

Raw PyTorch Model Merging (`mergekit-pytorch`)

Tokenizer Transplantation (`mergekit-tokensurgeon`)

Packages