lumos

Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"

open-sourceagent-frameworks
475
Stars
+0
Stars/month
0
Releases (6m)

Star Growth

466475485Mar 27Apr 1

Overview

Lumos is an open-source language agent framework that provides unified training for complex interactive tasks using modular design principles. Built on LLAMA-2-7B/13B models, it features a three-module architecture consisting of planning, grounding, and execution components. The framework is trained on approximately 56,000 high-quality subgoal and action annotations derived from ground-truth reasoning steps across multiple benchmarks. Lumos achieves competitive performance with GPT-4/3.5-based agents on web navigation, complex question answering, mathematical reasoning, and multimodal tasks. Its unified data format enables seamless support for diverse interactive tasks, making it a valuable resource for researchers developing open-source agents. The framework outperforms contemporaneous fine-tuned agents like FiReAct, AgentLM, and AutoAct on benchmarks such as Mind2Web, HotpotQA, WebShop, and InterCode_SQL, while maintaining the transparency and accessibility advantages of open-source models.

Deep Analysis

Key Differentiator

vs GPT-4 agents: unified modular framework achieving competitive performance with 7B-13B models — planning + grounding + execution separation enables task-agnostic agent architecture from Allen AI

Capabilities

  • Modular language agent framework: planning, grounding, and execution
  • Task decomposition into subgoals with API-mapped actions
  • Handles web navigation, complex QA, math, and multimodal reasoning
  • Unified data format across diverse task types
  • Competitive with GPT-4 agents using smaller 7B-13B models
  • Training data and checkpoints freely available

🔗 Integrations

LLAMA-2 (7B/13B)GPT-4 (for annotations)Hugging FaceCUDA/PyTorch

Best For

  • Multi-step reasoning: web navigation, QA, math problem-solving
  • Research into efficient agent architectures with small models
  • Building agents competitive with GPT-4 at lower cost

Not Ideal For

  • Real-time applications needing extreme low latency
  • Specialized domains without training data coverage
  • Teams without GPU infrastructure for fine-tuning

Languages

Python

Deployment

Hugging Face model hubfine-tuning scriptsHugging Face Spaces demo

Known Limitations

  • May underperform on specialized domains outside training distribution
  • Requires significant compute for model fine-tuning
  • Limited to task types covered by training data

Pros

  • + Modular architecture with separate planning, grounding, and execution components enables flexible customization and debugging
  • + Unified data format supports multiple task types (web navigation, QA, math, multimodal) within a single framework
  • + Competitive performance with much larger proprietary models while being fully open-source and based on smaller LLAMA-2 models

Cons

  • - Based on LLAMA-2 architecture which is older and may not incorporate latest language model advances
  • - Primarily research-focused with limited documentation for production deployment
  • - Requires significant computational resources for training and may need fine-tuning for domain-specific applications

Use Cases

  • Research into open-source language agents and comparative studies against proprietary models
  • Web navigation and automation tasks requiring multi-step planning and execution
  • Complex question answering systems that need to break down problems into actionable subgoals

Getting Started

Clone the repository and install dependencies using the provided requirements, download the pre-trained Lumos models from Hugging Face or train your own using the provided training scripts, run the demo interface or integrate the modular components into your application for specific interactive tasks

Compare lumos