mamba-chat

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍

open-sourceagent-frameworks
941
Stars
+-8
Stars/month
0
Releases (6m)

Star Growth

922942961Mar 27Apr 1

Overview

Mamba-Chat is the first chat language model built on state-space model architecture instead of the traditional transformer design. Based on Albert Gu and Tri Dao's groundbreaking work 'Mamba: Linear-Time Sequence Modeling with Selective State Spaces', this model represents a significant architectural departure from conventional LLMs. Built on the Mamba-2.8B foundation model and fine-tuned on 16,000 carefully selected samples from the HuggingFaceH4/ultrachat_200k dataset, Mamba-Chat demonstrates that effective conversational AI can be achieved without transformer architecture. The project provides comprehensive training and fine-tuning code based on modified Huggingface Trainer classes, making it accessible for researchers and developers interested in exploring state-space models. With 942 GitHub stars, it has gained recognition as an important research contribution to the language modeling field. The model offers both CLI and web-based interfaces through Gradio, and supports custom fine-tuning with flexible configurations for different hardware setups, from standard GPUs to high-memory cards like RTX 3090/4090.

Deep Analysis

Key Differentiator

vs transformer-based chat models (LLaMA Chat, Mistral): the first conversational model built on Mamba's state-space architecture β€” enables research into SSM alternatives to transformers for dialogue

⚑ Capabilities

  • β€’ First chat language model based on state-space model (SSM) architecture instead of transformers
  • β€’ Fine-tuned Mamba-2.8B on 16K samples from UltraChat-200k dataset
  • β€’ CLI chatbot and Gradio web interface for interaction
  • β€’ Custom fine-tuning support on user datasets
  • β€’ Google Colab notebook for quick experimentation

πŸ”— Integrations

Hugging FaceGoogle ColabGradio

βœ“ Best For

  • βœ“ Researching state-space model architectures for conversational AI
  • βœ“ Comparing SSM vs transformer performance on chat tasks
  • βœ“ Fine-tuning lightweight chat models on custom data

βœ— Not Ideal For

  • βœ— Production-grade chatbot deployment
  • βœ— Tasks requiring transformer-level reasoning quality
  • βœ— Users without GPU hardware

Languages

Python

Deployment

CLI (chat.py)Gradio web app (app.py)Google Colablocal GPU (24GB recommended)

⚠ Known Limitations

  • ⚠ Requires ~24GB GPU memory for inference
  • ⚠ Limited to 2.8B parameter scale
  • ⚠ No performance benchmarks against transformer baselines documented
  • ⚠ Smaller training dataset (16K samples) compared to leading chat models

Pros

  • + Revolutionary state-space architecture offers linear-time sequence modeling as alternative to quadratic transformer attention
  • + Includes complete training and fine-tuning infrastructure with Huggingface integration and flexible hardware configurations
  • + Provides multiple interaction modes including CLI chatbot and Gradio web interface for easy accessibility

Cons

  • - Limited model size at 2.8B parameters compared to larger transformer-based alternatives
  • - Fine-tuned on relatively small dataset of 16,000 samples which may limit conversational capabilities
  • - Experimental architecture means less ecosystem support and fewer pre-trained variants available

Use Cases

  • β€’ Research into state-space model architectures for natural language processing and their efficiency advantages
  • β€’ Development of memory-efficient chatbots that require linear scaling with sequence length
  • β€’ Custom fine-tuning experiments on domain-specific conversational data using provided training infrastructure

Getting Started

1. Clone the repository and install dependencies with 'git clone https://github.com/havenhq/mamba-chat.git && cd mamba-chat && pip install -r requirements.txt'. 2. Start the CLI chatbot with 'python chat.py' to begin conversing with the model immediately. 3. For web interface, install Gradio with 'pip install gradio==4.8.0' and run 'python app.py --share' to create a shareable web interface.

Compare mamba-chat