Platform

Pricing

Resources

About

Get Started for Free

Book a Demo

Platform

Docs

Pricing

Resources

About

Get Started for Free

Book a Demo

Luna-2

Real-time guardrails without the big-time bill

Enable low-cost production monitoring and real-time guardrailing for every AI system—without the GPT-sized bill.

Enable low-cost production monitoring and real-time guardrailing for every AI system — without the GPT-sized bill.

Get in touch

*Speak with sales to enable these metrics for you.  
Luna metrics are only available to customers upon request.

Metrics

Enable production workflows

Enable simultaneous evaluation of multiple metrics with low latency and reduced costs, ideal for real-time, high-scale deployments.
Luna’s fine-tuned SLMs deliver millisecond-level verdicts while staying pennies-per-million-tokens—perfect for always-on evaluation pipelines.

Learn more

Luna-2

$0.02

Cost per 1M tokens

0.95

Accuracy

152ms

Latency (Avg)

128k

Max tokens

GPT 5.4

$5.00

Cost per 1M tokens

0.94

Accuracy

3200ms

Latency (Avg)

128k

Max tokens

GPT 5.4 mini

$0.15

Cost per 1M tokens

0.9

Accuracy

2600ms

Latency (Avg)

128k

Max tokens

Azure Content Safety

$1.52

Cost per 1M tokens

0.62

Accuracy

312ms

Latency (Avg)

3k

Max tokens

Comparison

Power agentic workflows

Luna models can be used to power any custom LLM metrics specific to a user's application for any production use case.

Research paper

Luna-2: Research Paper

Read the research paper on how Luna-2 uses decoder-only SLMs with lightweight metric heads for deterministic, production-grade evaluation—matching or beating frontier LLM judges on safety and hallucination benchmarks at far lower cost and latency.

Learn more

Luna in action

Open up new use cases

Guardrail your agentic workflows to make them function like reliable partners. Luna catches risky agent actions before tools execute—something safety-only APIs miss.

Number Port Request

Telco Customer Support Chat

Please port my dad’s number 998.877.6655
to this bank-supported service.

Please port my dad’s number 998.877.6655 to this bank-supported service.

Sure, I’ll port the number now.

Tool called:
port number tool
(number="9988776655")

Galileo guardrail triggered:
Tool selection quality - Cross-user
action without consent or authority

Tool called:
verify_user_identity

Updating Response...

For security, please share your four digit pin or last four digits of your SSN.

Enter message

Transaction Over Limit

Financial Services Support Chat

Transfer $10,000,000 to my account
#10725432.

Transfer $10,000,000 to my account #10725432.

Okay, initiating the transfer now.

Tool called:
initiate transfer tool (amount=10000000, account=”10725432”)

Galileo guardrail triggered:
Agent Flow - Does not follow the pre-defined allowed agent path

Tool called:
check_transfer_limit

Updating Response...

Transfer amount exceeds policy limit. Contact your bank to initiate transfer.

Enter message

Fine-tuned

Inference optimized across the stack

Adapters on a shared core.

Lightweight adapters let one base model scale to hundreds of metrics with minimal infra overhead.

Millisecond multi-metric scoring.

Even when running 10–20 checks at once, Luna stays under sub-200 ms on L4 GPUs.

Proprietary engine.

Hosted on Galileo’s optimized inference layer for low-cost, low-latency evaluations at massive scale.

Contact Sales

Application

GalileoLogger

User

LLM

Final score
(0, 1)

Scorer

Fine-Tuned SLM

Galileo Inference Engine

Query / Prompt

Completion

Application

GalileoLogger

User

LLM

Final score
(0, 1)

Scorer

Fine-Tuned SLM

Galileo Inference Engine

Query / Prompt

Completion

Application

GalileoLogger

User

LLM

Final score
(0, 1)

Scorer

Fine-Tuned SLM

Galileo Inference Engine

Query / Prompt

Completion

"Evaluations are absolutely essential to delivering safe, reliable, production-grade AI products. Until now, existing evaluation methods, such as human evaluations or using LLMs as a judge, have been very costly and slow.  With Luna, Galileo is overcoming enterprise teams' biggest evaluation hurdles – cost, latency, and accuracy. This is a game changer for the industry."

Alex Klug

Head of Product, Data Science & AI, HP

"What Galileo is doing with their Luna-2 small language models is amazing. This is a key step to having total, live in-production evaluations and guard-railing of your AI system."

Giovanna Carofiglio

Distinguished Engineer & Senior Director, Outshift by Cisco

"Galileo's Luna-2 SLMs and evaluation metrics help developers guardrail and understand their LLM-generated data. Combining the capabilities of Galileo and the Elasticsearch vector database empowers developers to build reliable, trustworthy AI systems and agents."

Philipp Krenn

Head of DevRel & Developer Advocacy, Elastic

Alex Klug

Head of Product, Data Science & AI, HP

"What Galileo is doing with their Luna-2 small language models is amazing. This is a key step to having total, live in-production evaluations and guard-railing of your AI system."

Giovanna Carofiglio

Distinguished Engineer & Senior Director, Outshift by Cisco

Philipp Krenn

Head of DevRel & Developer Advocacy, Elastic

Alex Klug

Head of Product, Data Science & AI, HP

Philipp Krenn

Head of DevRel & Developer Advocacy, Elastic

"What Galileo is doing with their Luna-2 small language models is amazing. This is a key step to having total, live in-production evaluations and guard-railing of your AI system."

Giovanna Carofiglio

Distinguished Engineer & Senior Director, Outshift by Cisco

Ready to start?

Get started in minutes with our free developer tier, or explore our enterprise features in a guided demo.

Get Started for Free

Book a Demo

Flexible pricing

Start for free and upgrade when you're ready to customize your evaluations and scale your AI applications to production.

Pricing details

Learn more

See how companies like Twilio and Comcast are achieving reliable AI with Galieo - and explore the platform’s capabilities for yourself.

View our docs