mamba-chat

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍

open-sourceagent-frameworks
Visit WebsiteView on GitHub
942
Stars
+79
Stars/month
0
Releases (6m)

Overview

Mamba-Chat is the first chat language model built on state-space model architecture instead of the traditional transformer design. Based on Albert Gu and Tri Dao's groundbreaking work 'Mamba: Linear-Time Sequence Modeling with Selective State Spaces', this model represents a significant architectural departure from conventional LLMs. Built on the Mamba-2.8B foundation model and fine-tuned on 16,000 carefully selected samples from the HuggingFaceH4/ultrachat_200k dataset, Mamba-Chat demonstrates that effective conversational AI can be achieved without transformer architecture. The project provides comprehensive training and fine-tuning code based on modified Huggingface Trainer classes, making it accessible for researchers and developers interested in exploring state-space models. With 942 GitHub stars, it has gained recognition as an important research contribution to the language modeling field. The model offers both CLI and web-based interfaces through Gradio, and supports custom fine-tuning with flexible configurations for different hardware setups, from standard GPUs to high-memory cards like RTX 3090/4090.

Pros

  • + Revolutionary state-space architecture offers linear-time sequence modeling as alternative to quadratic transformer attention
  • + Includes complete training and fine-tuning infrastructure with Huggingface integration and flexible hardware configurations
  • + Provides multiple interaction modes including CLI chatbot and Gradio web interface for easy accessibility

Cons

  • - Limited model size at 2.8B parameters compared to larger transformer-based alternatives
  • - Fine-tuned on relatively small dataset of 16,000 samples which may limit conversational capabilities
  • - Experimental architecture means less ecosystem support and fewer pre-trained variants available

Use Cases

Getting Started

1. Clone the repository and install dependencies with 'git clone https://github.com/havenhq/mamba-chat.git && cd mamba-chat && pip install -r requirements.txt'. 2. Start the CLI chatbot with 'python chat.py' to begin conversing with the model immediately. 3. For web interface, install Gradio with 'pip install gradio==4.8.0' and run 'python app.py --share' to create a shareable web interface.