grok-1

Grok open release

open-sourceagent-frameworks
Visit WebsiteView on GitHub
51.5k
Stars
+4294
Stars/month
0
Releases (6m)

Overview

Grok-1 is an open-source implementation of xAI's 314-billion parameter large language model, featuring a Mixture of 8 Experts (MoE) architecture. This repository provides JAX example code for loading and running the Grok-1 open-weights model, making one of the largest open-source language models accessible to researchers and developers. The model utilizes advanced techniques including rotary embeddings (RoPE), activation sharding, and 8-bit quantization support. With 64 layers, 48 attention heads for queries, and a SentencePiece tokenizer supporting 131,072 tokens, Grok-1 can handle sequences up to 8,192 tokens. The MoE architecture efficiently uses only 2 experts per token while maintaining the full model capacity. Released under the Apache 2.0 license, this represents a significant contribution to open AI research, allowing academic institutions and independent researchers to experiment with frontier-scale language models. The implementation prioritizes correctness and accessibility over efficiency, avoiding custom kernels to ensure broad compatibility. This makes Grok-1 particularly valuable for understanding large-scale model architectures, conducting research on MoE systems, and serving as a foundation for derivative works in the open-source AI community.

Pros

  • + Massive 314B parameter model with state-of-the-art Mixture of Experts architecture released as fully open-source under Apache 2.0 license
  • + Comprehensive implementation with advanced features like rotary embeddings, activation sharding, and 8-bit quantization support for memory optimization
  • + High-quality codebase designed for correctness and accessibility, avoiding complex custom kernels to ensure broad research compatibility

Cons

  • - Requires extremely large GPU memory resources due to 314B parameter size, making it inaccessible to most individual researchers
  • - MoE layer implementation is intentionally inefficient, prioritizing validation over performance optimization
  • - Massive checkpoint download size (requires torrent or HuggingFace Hub) creates significant storage and bandwidth requirements

Use Cases

Getting Started

Download the model weights using either the provided torrent magnet link or HuggingFace Hub CLI command and place the ckpt-0 directory in checkpoints folder. Install the required dependencies by running 'pip install -r requirements.txt' in the project directory. Execute 'python run.py' to load the checkpoint and test the model with sample input, ensuring you have sufficient GPU memory for the 314B parameter model.