guidance

A guidance language for controlling large language models.

open-sourceagent-frameworks
21.4k
Stars
+0
Stars/month
2
Releases (6m)

Star Growth

20.9k21.4k21.8kMar 27Apr 1

Overview

Guidance is a Python framework that provides programmatic control over large language model outputs, enabling developers to steer generation with precision while reducing costs and latency compared to traditional prompting or fine-tuning approaches. The tool allows users to constrain generation using regex patterns and context-free grammars, ensuring output follows specific formats and structures. It supports seamless interleaving of control logic (conditionals, loops, tool usage) with text generation, making it possible to build complex conversational flows and structured data extraction pipelines. Guidance works with multiple backends including Transformers, llama.cpp, and OpenAI models, providing a unified Pythonic interface regardless of the underlying model. The framework is particularly valuable for applications requiring reliable output formatting, structured data extraction, or complex multi-step reasoning workflows. With over 21,000 GitHub stars, it has gained significant adoption in the AI community for its ability to make language model interactions more predictable and cost-effective while maintaining the flexibility of programmatic control.

Deep Analysis

Key Differentiator

Unlike prompt-based structured output approaches (like OpenAI JSON mode), Guidance enforces output constraints at the token level using grammars, guaranteeing valid output on every generation while reducing latency through intelligent token fast-forwarding — no other framework offers this depth of generation control

Capabilities

  • Constrained language model generation via regex patterns and context-free grammars
  • Guaranteed structured output (JSON, specific formats) without post-processing
  • Interleaved control flow combining Python conditionals/loops with LLM generation
  • Token fast-forwarding to skip known tokens and reduce latency/cost
  • Custom function composition through @guidance decorator

🔗 Integrations

Transformers (Hugging Face)llama.cppOpenAI APIPydantic (JSON schema generation)Jupyter notebooks (widgets)

Best For

  • Developers needing guaranteed structured output from LLMs without retry loops or post-processing
  • Teams optimizing LLM inference cost and latency through constrained generation

Not Ideal For

  • Free-form creative text generation — guidance adds constraints by design
  • Teams using only API-based LLMs without local model access — best features require local models

Languages

Python

Deployment

pip install from PyPILocal Python environmentJupyter notebook integration

Pricing Detail

Free: Fully open-source
Paid: N/A

Known Limitations

  • Context-free grammar constraints require full backend LLM support
  • Python only — no JavaScript/TypeScript implementation
  • Not all LLM backends support all constraint features equally

Pros

  • + Pythonic interface that integrates naturally with existing Python workflows and familiar programming patterns
  • + Constrained generation capabilities that guarantee output syntax and structure using regex and context-free grammars
  • + Multi-backend support allowing seamless switching between different model providers and local/cloud deployments

Cons

  • - Requires Python programming knowledge, limiting accessibility for non-technical users
  • - Learning curve for advanced constraint features like context-free grammars and complex regex patterns
  • - Dependent on backend availability and may require additional setup for specific model types

Use Cases

  • Structured data extraction from documents or conversations where output must conform to specific JSON schemas or formats
  • Building conversational AI applications that require controlled dialogue flows and predictable response structures
  • Cost-effective alternative to fine-tuning when you need specific output formatting without retraining models

Getting Started

Install Guidance via pip install guidance, then import and initialize your preferred model backend (e.g., Transformers, LlamaCpp, or OpenAI), finally create your first controlled generation using the system/user/assistant context managers with gen() for constrained output

Compare guidance