petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

open-sourceagent-frameworks
Visit WebsiteView on GitHub
10.0k
Stars
+836
Stars/month
0
Releases (6m)

Overview

Petals is a distributed peer-to-peer system that enables running large language models (LLMs) collaboratively across multiple machines, similar to BitTorrent file sharing. Instead of requiring powerful hardware to run massive models like Llama 3.1 (405B parameters), Mixtral (8x22B), or BLOOM (176B), Petals distributes model layers across a network of volunteer computers, allowing users to access these models from modest hardware including desktop computers or Google Colab instances. The system maintains compatibility with the Hugging Face Transformers API, making it easy to integrate into existing workflows. Users can perform both inference and fine-tuning tasks, with claimed performance improvements up to 10x faster than traditional offloading methods. The platform operates as a community-driven initiative where participants contribute GPU resources to collectively host model layers, creating a shared computational network. Petals supports both public swarms for general use and private swarms for sensitive applications, providing flexibility for different privacy requirements. The system includes a web-based chatbot interface and programmatic API access, making it accessible to both technical and non-technical users. With over 10,000 GitHub stars, Petals represents a novel approach to democratizing access to large language models by leveraging distributed computing principles.

Pros

  • + Enables running very large models (405B+ parameters) on modest hardware through distributed computing
  • + Maintains full compatibility with Hugging Face Transformers API for easy integration
  • + Claims significant performance improvements (up to 10x faster) for fine-tuning and inference compared to offloading

Cons

  • - Data privacy concerns since processing occurs across public swarm of unknown participants
  • - Dependency on community-contributed GPU resources for model availability and performance
  • - Potential network latency and reliability issues inherent in distributed systems

Use Cases

Getting Started

Install Petals via pip (`pip install petals`), import the required classes (`from petals import AutoDistributedModelForCausalLM`), then connect to a distributed model by specifying the model name from the available models at health.petals.dev and start generating text with standard Transformers syntax.