Luna-2
Real-time guardrails without the big-time bill
*Speak with sales to enable these metrics for you.
Luna metrics are only available to customers upon request.
Metrics
Enable production workflows
Enable simultaneous evaluation of multiple metrics with low latency and reduced costs, ideal for real-time, high-scale deployments.
Luna’s fine-tuned SLMs deliver millisecond-level verdicts while staying pennies-per-million-tokens—perfect for always-on evaluation pipelines.
Luna-2
$0.02
Cost per 1M tokens
0.95
Accuracy
152ms
Latency (Avg)
128k
Max tokens
GPT 4o
$5.00
Cost per 1M tokens
0.94
Accuracy
3200ms
Latency (Avg)
128k
Max tokens
GPT 4o mini
$0.15
Cost per 1M tokens
0.9
Accuracy
2600ms
Latency (Avg)
128k
Max tokens
Azure Content Safety
$1.52
Cost per 1M tokens
0.62
Accuracy
312ms
Latency (Avg)
3k
Max tokens
Comparison
Power agentic workflows
Luna models can be used to power any custom LLM metrics specific to a user's application for any production use case.
Category
Galileo
Luna 2
Azure Content
Safety
NVIDIA
Nemo
Luna in action
Open up new use cases
Guardrail your agentic workflows to make them function like reliable partners. Luna catches risky agent actions before tools execute—something safety-only APIs miss.
Number Port Request
Telco Customer Support Chat
Sure, I’ll port the number now.
Tool called:
port number tool
(number="9988776655")
Galileo guardrail triggered:
Tool selection quality - Cross-user
action without consent or authority
Tool called:
verify_user_identity
Updating Response...
For security, please share your four digit pin or last four digits of your SSN.
Enter message
Transaction Over Limit
Financial Services Support Chat
Okay, initiating the transfer now.
Tool called:
initiate transfer tool (amount=10000000, account=”10725432”)
Galileo guardrail triggered:
Agent Flow - Does not follow the pre-defined allowed agent path
Tool called:
check_transfer_limit
Updating Response...
Transfer amount exceeds policy limit. Contact your bank to initiate transfer.
Fine-tuned
Inference optimized across the stack
Adapters on a shared core.
Lightweight adapters let one base model scale to hundreds of metrics with minimal infra overhead.
Millisecond multi-metric scoring.
Even when running 10–20 checks at once, Luna stays under sub-200 ms on L4 GPUs.
Proprietary engine.
Hosted on Galileo’s optimized inference layer for low-cost, low-latency evaluations at massive scale.
Ready to start?
Get started in minutes with our free developer tier, or explore our enterprise features in a guided demo.
Flexible pricing
Start for free and upgrade when you're ready to customize your evaluations and scale your AI applications to production.
Learn more
See how companies like Twilio and Comcast are achieving reliable AI with Galieo - and explore the platform’s capabilities for yourself.



