mergekit

Tools for merging pretrained large language models.

Visit WebsiteView on GitHub
6.9k
Stars
+576
Stars/month
1
Releases (6m)

Overview

mergekit is a comprehensive toolkit for merging pretrained large language models, designed to combine multiple specialized models into single versatile systems without the computational overhead of traditional ensembling. Using an innovative out-of-core approach, it enables sophisticated model merging even in resource-constrained environments, running efficiently on CPU or with as little as 8GB of VRAM. The tool supports major model architectures including Llama, Mistral, GPT-NeoX, and StableLM, offering multiple merge algorithms to suit different use cases. By operating directly in the weight space, mergekit allows researchers and practitioners to transfer capabilities between models, find optimal trade-offs between different behaviors, and create new functionalities through creative combinations—all while maintaining the same inference cost as a single model. With over 6,900 GitHub stars, it has become an essential tool in the AI research community for model experimentation and capability enhancement.

Pros

  • + Memory-efficient architecture enables complex merges on modest hardware (8GB VRAM minimum) using lazy tensor loading and out-of-core processing
  • + Comprehensive algorithm support includes linear interpolation, SLERP, DARE, and evolutionary methods for diverse merging strategies
  • + Production-ready with support for major model families (Llama, Mistral, GPT-NeoX) and flexible CPU/GPU execution options

Cons

  • - Requires deep understanding of model architectures and merge parameters to achieve optimal results without degrading performance
  • - Limited documentation for advanced techniques may require experimentation to find best practices for specific use cases
  • - Merge quality heavily depends on compatibility between source models and their training distributions

Use Cases

Getting Started

1. Install: `pip install mergekit` and ensure you have sufficient disk space for model weights 2. Configure: Create a YAML merge configuration specifying source models, merge method (linear/slerp), and parameters 3. Execute: Run `mergekit-yaml config.yaml ./output-model --copy-tokenizer` to perform the merge and save the resulting model