llm-comparator
LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.
521
Stars
+0
Stars/month
0
Releases (6m)
Star Growth
Deep Analysis
Key Differentiator
vs generic eval dashboards: combines visual analytics with rationale clustering and custom field analysis to identify specific behavioral differences between models — from Google PAIR team
⚡ Capabilities
- • Interactive visualization for side-by-side LLM evaluation results
- • Qualitative difference discovery at example and slice levels
- • Rationale clustering to identify behavioral patterns between models
- • Custom field analysis for prompt category breakdowns
- • Python library for generating analysis JSON from evaluation data
- • Support for LLM-as-a-judge evaluation methods
🔗 Integrations
Google Vertex AI AutoSxSChatbot ArenaJupyter notebooks
✓ Best For
- ✓ Comparing two LLM outputs with numerical evaluation scores
- ✓ Discovering when and why one model outperforms another
- ✓ Analyzing response patterns across prompt categories
✗ Not Ideal For
- ✗ Comparing more than two models simultaneously
- ✗ Unstructured qualitative feedback without scoring
- ✗ Real-time model monitoring in production
Languages
TypeScriptJavaScriptPython
Deployment
Hosted web app (GitHub Pages)local npm buildPython PyPI package for data generation
⚠ Known Limitations
- ⚠ Research project in active development with expected bugs
- ⚠ Compares only two models simultaneously
- ⚠ Requires properly formatted JSON input following specific schema
- ⚠ Small development team with limited support