Galileo | AI Evaluation, Observability & Reliability Platform

Platform

Docs

Pricing

Resources

About

Book a Demo

Get Started for Free

Platform

Docs

Pricing

Resources

About

Book a Demo

Get Started for Free

AI Reliability Platform

AI apps don't always do what you want. Galileo is the end-to-end platform for AI evaluation, observability, and real-time protection, so you can ship with confidence.

Get Started for Free

Complete

For all your AI workflows

Galileo is the best way to measure AI accuracy, offline and online. Start with out-of-box evaluators, or create your own. Only Galileo distills evaluators into compact models that run with low-latency and low-cost.

The Galileo platform is like a copilot for your AI team, coaxing the right behavior from agents, chatbots, and RAG applications. It blocks harmful outputs and security risks in real-time, while continuously improving your prompts with feedback from users and subject-matter experts.

1. AI Evaluation

During development, you need to test a lot of variations to get the results you want. Let Galileo keep track of models and prompts, testing every combination to find the one that performs best. • Integrate with code or use our playground UI • Build and execute golden test-sets • Debug quickly with traces that show latency and cost • Evaluate AI output with accurate pre-built metrics, or create your own • Refine your prompts quickly with ML-powered insights • Organize your prompt versions in one place

Get started

2. AI Observability

3. Real-time Protection

1. AI Evaluation

Get started

2. AI Observability

3. Real-time Protection

1. AI Evaluation

Get started

2. AI Observability

3. Real-time Protection

Book a demo

Adaptive

Powered by the Evaluation Engine

You can't improve your AI apps if you're flying blind. Galileo sheds light with an Evaluation Engine that brings the best of both worlds--prebuilt evaluators to get you started and custom evaluators for your unique application. Galileo makes it easy to build (and improve) your perfect library of accurate evaluators and metrics.

Prebuilt metrics

Get started with over 20 out-of-the-box evaluators that are tested and accurate.

Custom metrics

Add code-based evaluators or automatically generate accurate LLM-as-judge evaluators just by typing a description.

Auto-tune

Improve evaluators with CLHF (Continuous Learning with Human Feedback) which optimizes prompts by adding few-shot examples.

Inference

Monitor AI in production with low-latency evaluators hosted on our purpose-built inference server.

Scalable

Proprietary models for fast evaluation

Advanced evaluators usually come with a tradeoff: accuracy vs speed. Galileo solves the problem with Luna models, serving evaluators that are accurate and low-latency (often <200ms). This makes it possible to monitor production AI applications in near real-time, fast enough to intercept for safety and security.

Luna-2 (8b)

$0.02

0.88

Cost per 1M tokens

Accuracy

214ms

128k

Latency (Avg)

Max tokens

Luna-2 (8b)

$0.02

0.88

Cost per 1M tokens

Accuracy

214ms

128k

Latency (Avg)

Max tokens

Luna-1 (8b)

$0.02

0.88

Cost per 1M tokens

Accuracy

214ms

128k

Latency (Avg)

Max tokens

Luna-2 (8b)

$0.02

0.88

Cost per 1M tokens

Accuracy

214ms

128k

Latency (Avg)

Max tokens

Luna-2 (3b)

$0.01

0.87

Cost per 1M tokens

Accuracy

167ms

128k

Latency (Avg)

Max tokens

Luna-2 (3b)

$0.01

0.87

Cost per 1M tokens

Accuracy

167ms

128k

Latency (Avg)

Max tokens

Luna-1 (3b)

$0.01

0.87

Cost per 1M tokens

Accuracy

167ms

128k

Latency (Avg)

Max tokens

Luna-2 (3b)

$0.01

0.87

Cost per 1M tokens

Accuracy

167ms

128k

Latency (Avg)

Max tokens

Azure AI Content Safety

$1.52

0.62

Cost per 1M tokens

Accuracy

312ms

Latency (Avg)

Max tokens

Azure AI Content Safety

$1.52

0.62

Cost per 1M tokens

Accuracy

312ms

Latency (Avg)

Max tokens

Azure AI Content Safety

$1.52

0.62

Cost per 1M tokens

Accuracy

312ms

Latency (Avg)

Max tokens

Azure AI Content Safety

$1.52

0.62

Cost per 1M tokens

Accuracy

312ms

Latency (Avg)

Max tokens

Learn More

Online and offline

Easy to integrate

The Galileo platform is an end-to-end solution for experimentation, CI/CD, observability, and real-time monitoring. You can use it for offline or online evaluation. And it's easy to connect to your application layer with our SDKs and APIs.

Agent Observability Platform

Agent Development Lifecycle

Experimentation

CI/CD Testing

Real-time Monitoring

Run-time Protection

Observability and Intervention

Agent Graph

Agent Insights

Root Cause Analysis

Custom Dashboards

Guardrails

Alerting

Metrics Engine

Prebuilt Metrics

Custom Metrics

Autogen Metrics

Auto-tune

Luna SLMs

Inference Server

Evaluation Assets

Prompt Store

Datasets

Traces / Sessions

Guardrail Policies

Annotations

Agent Observability Platform

Agent Development Lifecycle

Experimentation

CI/CD Testing

Real-time Monitoring

Run-time Protection

Observability and Intervention

Agent Graph

Agent Insights

Root Cause Analysis

Custom Dashboards

Guardrails

Alerting

Metrics Engine

Prebuilt Metrics

Custom Metrics

Autogen Metrics

Auto-tune

Luna SLMs

Inference Server

Evaluation Assets

Prompt Store

Datasets

Traces / Sessions

Guardrail Policies

Annotations

Agent Observability Platform

Agent Development Lifecycle

Experimentation

CI/CD Testing

Real-time Monitoring

Run-time Protection

Observability and Intervention

Agent Graph

Agent Insights

Root Cause Analysis

Custom Dashboards

Guardrails

Alerting

Metrics Engine

Prebuilt Metrics

Custom Metrics

Autogen Metrics

Auto-tune

Luna SLMs

Inference Server

Evaluation Assets

Prompt Store

Datasets

Traces / Sessions

Guardrail Policies

Annotations

Agent Observability Platform

Agent Development Lifecycle

Experimentation

CI/CD Testing

Real-time Monitoring

Run-time Protection

Observability and Intervention

Agent Graph

Agent Insights

Root Cause Analysis

Custom Dashboards

Guardrails

Alerting

Metrics Engine

Prebuilt Metrics

Custom Metrics

Autogen Metrics

Auto-tune

Luna SLMs

Inference Server

Evaluation Assets

Prompt Store

Datasets

Traces / Sessions

Guardrail Policies

Annotations

SDKs

APIs

Your App

Models

Orchestration

Incident Response

OnCall

Retrieval

Cloud / Data Platforms

Your App

Models

Orchestration

Incident Response

OnCall

Retrieval

Cloud / Data Platforms

Your App

Models

Orchestration

Incident Response

OnCall

Retrieval

Cloud / Data Platforms

Your App

Models

Orchestration

Incident Response

OnCall

Retrieval

Cloud / Data Platforms

Ready to start?

Get started in minutes with our free developer tier, or explore our enterprise features in a guided demo.

Get Started for Free

Book a Demo

Flexible pricing

Start for free and upgrade when you're ready to customize your evaluations and scale your AI applications to production.

Pricing details

Learn more

See how companies like Twilio and Comcast are achieving reliable AI with Galieo - and explore the platform’s capabilities for yourself.

View our docs