πŸ“‘

Observability & Evaluation

Monitoring, tracing, and testing infrastructure for running AI agents reliably in production

65 tools

MinerU

free

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

⭐ 57.4k↑ 4782/moobservability-evaluation

worldmonitor

open-source

Real-time global intelligence dashboard. AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface

⭐ 44.6k↑ 3716/moobservability-evaluation

ragflow

open-source

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

⭐ 76.4k↑ 6367/moobservability-evaluation

litellm

free

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropi

⭐ 41.2k↑ 3433/movoice-agents

firecrawl

free

πŸ”₯ The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data

⭐ 99.2k↑ 8267/mobrowser-web-agents

haystack

open-source

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, m

⭐ 24.6k↑ 2053/moobservability-evaluation

langfuse

open-source

πŸͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

⭐ 23.9k↑ 1990/moobservability-evaluation

mastra

free

From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.

⭐ 22.4k↑ 1866/moobservability-evaluation

prefect

open-source

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

⭐ 22.0k↑ 1831/moobservability-evaluation

Scrapegraph-ai

open-source

Python scraper based on AI

⭐ 23.1k↑ 1928/moobservability-evaluation

promptfoo

open-source

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and

⭐ 18.6k↑ 1553/moobservability-evaluation

opik

open-source

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

⭐ 18.5k↑ 1543/moobservability-evaluation

weaviate

open-source

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a c

⭐ 15.9k↑ 1325/moobservability-evaluation

unstructured

open-source

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to

⭐ 14.3k↑ 1195/moobservability-evaluation

tensorzero

open-source

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.

⭐ 11.2k↑ 930/moobservability-evaluation

txtai

open-source

πŸ’‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflows

⭐ 12.4k↑ 1029/moobservability-evaluation

phoenix

free

AI Observability & Evaluation

⭐ 9.1k↑ 755/moobservability-evaluation

DocsGPT

open-source

Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

⭐ 17.8k↑ 1483/moobservability-evaluation

llmware

open-source

Unified framework for building enterprise RAG pipelines with small, specialized models

⭐ 14.9k↑ 1239/moobservability-evaluation

deepeval

open-source

The LLM Evaluation Framework

⭐ 14.3k↑ 1193/moobservability-evaluation

openllmetry

open-source

Open-source observability for your GenAI or LLM application, based on OpenTelemetry

⭐ 7.0k↑ 580/moobservability-evaluation

voltagent

open-source

AI Agent Engineering Platform built on an Open Source TypeScript AI Agent Framework

⭐ 7.0k↑ 588/moobservability-evaluation

oumi

open-source

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

⭐ 8.9k↑ 743/moobservability-evaluation

manifest

open-source

Smart LLM Routing for OpenClaw. Cut Costs up to 70% 🦞🦚

⭐ 4.1k↑ 343/moobservability-evaluation

agenta

free

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

⭐ 4.0k↑ 332/moobservability-evaluation

langroid

open-source

Harness LLMs with Multi-Agent Programming

⭐ 3.9k↑ 329/movoice-agents

bifrost

open-source

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 Β΅s overhead at 5k RPS.

⭐ 3.3k↑ 273/moobservability-evaluation

langwatch

free

The platform for LLM evaluations and AI agent testing

⭐ 3.2k↑ 264/mono-code-agent-builders

vanna

open-source

πŸ€– Chat with your SQL database πŸ“Š. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval πŸ”„.

⭐ 23.1k↑ 1928/mono-code-agent-builders

ragas

open-source

Supercharge Your LLM Application Evaluations πŸš€

⭐ 13.1k↑ 1094/moobservability-evaluation

openlit

open-source

Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. πŸš€πŸ’» Integrates with 50+ LLM Providers,

⭐ 2.3k↑ 194/moobservability-evaluation

gorilla

open-source

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

⭐ 12.8k↑ 1065/movoice-agents

llm-app

open-source

Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. 🐳Docker-friendly.⚑Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, a

⭐ 59.4k↑ 4949/moobservability-evaluation

OmniRoute

open-source

OmniRoute is an AI gateway for multi-provider LLMs: an OpenAI-compatible endpoint with smart routing, load balancing, retries, and fallbacks. Add policies, rate limits, caching, and observability for

⭐ 1.3k↑ 109/moobservability-evaluation

WFGY

free

WFGY is an open-source AI Troubleshooting Atlas for RAG, agents, and real-world AI workflows. Includes the 16-problem map, Global Debug Card, and WFGY 3.0. ⭐ Star to help more builders find this repo.

⭐ 1.7k↑ 140/moobservability-evaluation

helicone

open-source

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 πŸ“

⭐ 5.4k↑ 446/moobservability-evaluation

Langchain-Chatchat

open-source

Langchain-Chatchat(原Langchain-ChatGLMοΌ‰εŸΊδΊŽ Langchain 与 ChatGLM, Qwen 与 Llama η­‰θ―­θ¨€ζ¨‘εž‹ηš„ RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Ll

⭐ 37.7k↑ 3139/moobservability-evaluation

FastChat

open-source

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

⭐ 39.5k↑ 3288/moobservability-evaluation

uqlm

open-source

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

⭐ 1.1k↑ 94/moobservability-evaluation

agentops

open-source

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and Ca

⭐ 5.4k↑ 451/moobservability-evaluation

storm

open-source

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

⭐ 28.0k↑ 2337/moobservability-evaluation

evals

free

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

⭐ 18.1k↑ 1508/moobservability-evaluation

gitingest

open-source

Replace 'hub' with 'ingest' in any GitHub URL to get a prompt-friendly extract of a codebase

⭐ 14.2k↑ 1186/moobservability-evaluation

GPTDiscord

open-source

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

⭐ 1.9k↑ 154/moobservability-evaluation

llama-github

open-source

Llama-github is an open-source Python library that empowers LLM Chatbots, AI Agents, and Auto-dev Solutions to conduct Agentic RAG from actively selected GitHub public projects. It Augments through LL

⭐ 319↑ 27/moobservability-evaluation

R2R

open-source

SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

⭐ 7.7k↑ 646/moobservability-evaluation

AgentBench

open-source

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

⭐ 3.3k↑ 273/moobservability-evaluation

Verba

open-source

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

⭐ 7.6k↑ 635/moobservability-evaluation

TaskingAI

open-source

The open source platform for AI-native application development.

⭐ 5.4k↑ 448/movoice-agents

vision-agent

open-source

This tool has been deprecated. Use Agentic Document Extraction instead.

⭐ 5.3k↑ 440/moobservability-evaluation

bRAG-langchain

free

Everything you need to know to build your own RAG application

⭐ 4.1k↑ 339/moobservability-evaluation

text-extract-api

open-source

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSO

⭐ 3.1k↑ 256/moobservability-evaluation

pezzo

open-source

πŸ•ΉοΈ Open-source, developer-first LLMOps platform designed to streamline prompt design, version management, instant delivery, collaboration, troubleshooting, observability and more.

⭐ 3.2k↑ 268/moobservability-evaluation

ChainForge

open-source

An open-source visual programming environment for battle-testing prompts to LLMs.

⭐ 3.0k↑ 247/mono-code-agent-builders

clip-retrieval

open-source

Easily compute clip embeddings and build a clip retrieval system with them

⭐ 2.7k↑ 228/moobservability-evaluation

uptrain

open-source

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro

⭐ 2.3k↑ 195/moobservability-evaluation

LLM-eval-survey

free

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

⭐ 1.6k↑ 133/moobservability-evaluation

langfair

free

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

⭐ 255↑ 21/moobservability-evaluation

swiss_army_llama

free

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

⭐ 1.1k↑ 88/moobservability-evaluation

canopy

open-source

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone

⭐ 1.0k↑ 86/moobservability-evaluation

langkit

open-source

πŸ” LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). πŸ“š Extracts signals from prompts & responses, ensuring safety & security. πŸ›‘οΈ Features include text quality, relevance m

⭐ 980↑ 82/moobservability-evaluation

auto-evaluator

free

Evaluation tool for LLM QA chains

⭐ 782↑ 65/moobservability-evaluation

llm-comparator

open-source

LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR team.

⭐ 521↑ 43/mono-code-agent-builders

bananalyzer

open-source

Open source AI Agent evaluation framework for web tasks πŸ’πŸŒ

⭐ 327↑ 27/moobservability-evaluation

repochat

open-source

Chatbot assistant enabling GitHub repository interaction using LLMs with Retrieval Augmented Generation

⭐ 316↑ 26/moobservability-evaluation