uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

Visit WebsiteView on GitHub
1.1k
Stars
+94
Stars/month
10
Releases (6m)

Overview

UQLM (Uncertainty Quantification for Language Models) is a Python library designed to detect hallucinations in Large Language Model outputs using advanced uncertainty quantification techniques. Developed by CVS Health and backed by peer-reviewed research published in JMLR and TMLR, this tool addresses one of the most critical challenges in deploying LLMs in production: ensuring reliability and detecting when models generate incorrect or fabricated information. The library provides a comprehensive suite of response-level scorers that analyze LLM outputs and return confidence scores between 0 and 1, where higher scores indicate lower likelihood of hallucinations or errors. UQLM categorizes different scorer types based on their latency, cost, and compatibility characteristics, allowing users to choose the appropriate method based on their specific requirements and constraints. This flexibility makes it suitable for both research applications and production deployments where reliability is paramount. The tool's academic foundation ensures that the uncertainty quantification methods are scientifically validated, while its practical design allows for seamless integration into existing LLM workflows. With over 1,100 GitHub stars, UQLM has gained recognition in the AI community as a reliable solution for improving LLM trustworthiness.

Pros

  • + Research-backed uncertainty quantification methods published in top-tier academic journals (JMLR, TMLR)
  • + Multiple scorer types offering different trade-offs between latency, cost, and accuracy for flexible deployment
  • + Simple installation and integration with existing LLM workflows through PyPI distribution

Cons

  • - Requires Python 3.10+ which may limit compatibility with older environments
  • - Different scorers add varying levels of latency and computational cost to LLM inference
  • - Limited to response-level scoring rather than token-level or real-time uncertainty detection

Use Cases

Getting Started

1. Install the package with 'pip install uqlm' 2. Import and initialize a scorer appropriate for your latency/cost requirements 3. Pass your LLM outputs to the scorer to receive confidence scores between 0-1