mamba-chat

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍

open-sourceagent-frameworks

Visit Website View on GitHub

941

Stars

+-8

Stars/month

Releases (6m)

Star Growth

Overview

Mamba-Chat is the first chat language model built on state-space model architecture instead of the traditional transformer design. Based on Albert Gu and Tri Dao's groundbreaking work 'Mamba: Linear-Time Sequence Modeling with Selective State Spaces', this model represents a significant architectural departure from conventional LLMs. Built on the Mamba-2.8B foundation model and fine-tuned on 16,000 carefully selected samples from the HuggingFaceH4/ultrachat_200k dataset, Mamba-Chat demonstrates that effective conversational AI can be achieved without transformer architecture. The project provides comprehensive training and fine-tuning code based on modified Huggingface Trainer classes, making it accessible for researchers and developers interested in exploring state-space models. With 942 GitHub stars, it has gained recognition as an important research contribution to the language modeling field. The model offers both CLI and web-based interfaces through Gradio, and supports custom fine-tuning with flexible configurations for different hardware setups, from standard GPUs to high-memory cards like RTX 3090/4090.

Deep Analysis

Key Differentiator

vs transformer-based chat models (LLaMA Chat, Mistral): the first conversational model built on Mamba's state-space architecture — enables research into SSM alternatives to transformers for dialogue

⚡ Capabilities

• First chat language model based on state-space model (SSM) architecture instead of transformers
• Fine-tuned Mamba-2.8B on 16K samples from UltraChat-200k dataset
• CLI chatbot and Gradio web interface for interaction
• Custom fine-tuning support on user datasets
• Google Colab notebook for quick experimentation

🔗 Integrations

Hugging FaceGoogle ColabGradio

✓ Best For

✓ Researching state-space model architectures for conversational AI
✓ Comparing SSM vs transformer performance on chat tasks
✓ Fine-tuning lightweight chat models on custom data

✗ Not Ideal For

✗ Production-grade chatbot deployment
✗ Tasks requiring transformer-level reasoning quality
✗ Users without GPU hardware

Languages

Python

Deployment

CLI (chat.py)Gradio web app (app.py)Google Colablocal GPU (24GB recommended)

⚠ Known Limitations

⚠ Requires ~24GB GPU memory for inference
⚠ Limited to 2.8B parameter scale
⚠ No performance benchmarks against transformer baselines documented
⚠ Smaller training dataset (16K samples) compared to leading chat models

Pros

+ Revolutionary state-space architecture offers linear-time sequence modeling as alternative to quadratic transformer attention
+ Includes complete training and fine-tuning infrastructure with Huggingface integration and flexible hardware configurations
+ Provides multiple interaction modes including CLI chatbot and Gradio web interface for easy accessibility

Cons

- Limited model size at 2.8B parameters compared to larger transformer-based alternatives
- Fine-tuned on relatively small dataset of 16,000 samples which may limit conversational capabilities
- Experimental architecture means less ecosystem support and fewer pre-trained variants available

Use Cases

• Research into state-space model architectures for natural language processing and their efficiency advantages
• Development of memory-efficient chatbots that require linear scaling with sequence length
• Custom fine-tuning experiments on domain-specific conversational data using provided training infrastructure

Getting Started

1. Clone the repository and install dependencies with 'git clone https://github.com/havenhq/mamba-chat.git && cd mamba-chat && pip install -r requirements.txt'. 2. Start the CLI chatbot with 'python chat.py' to begin conversing with the model immediately. 3. For web interface, install Gradio with 'pip install gradio==4.8.0' and run 'python app.py --share' to create a shareable web interface.

Compare mamba-chat

mamba-chat vs claude-code mamba-chat vs llama.cpp mamba-chat vs dify mamba-chat vs OpenHands mamba-chat vs OpenHands mamba-chat vs langgraph