Aggregate Luna 2 architecture proof. $30M+ in customer savings vs. LLM-as-judge baselines.
Why SLM judges change the math.
LLM-as-judge can't run on every call. SLM judges can. That's the unlock — comprehensive eval coverage at runtime cost, with one model that does two jobs.
Every call, evaluated.
SLM economics make 100% eval coverage affordable. No sampling. No blind spots.
Tuned to your traffic.
Generic evals miss your domain. A custom Luna SLM judge learns what good looks like on your data.
Same model, two jobs.
One trained artifact scores your evals and guards your runtime. Two layers of security from one model.
Guardrails on any behavior.
Block bad outputs in real time on the behaviors that matter — brand voice, scope, policy. Not just safety.
Fine-tune your own AI judges.
Four steps to a custom-trained Luna SLM judge that runs as both an eval and a guardrail. Days, not quarters.
Label 300 examples with the team that owns quality.
Start from annotated traces or a golden dataset. Human review defines what correct looks like on your data.
Expand the seed into a 3K-row training corpus.
Productized synth-data flow builds the full training set inside your environment. No data leaves the boundary.
LoRA fine-tune on the Luna 2 SLM. On your GPUs.
Single-token classification with log-probability scoring. F1 and AUC-ROC reported every training round.
Deploy as eval metric and runtime guardrail.
Appears alongside preset Luna metrics. The same trained artifact enforces policy at runtime via Luna Guardrails.
"What Galileo is doing with their Luna small language models is amazing. A key step to having total, live in-production evaluations and guardrailing of your AI system."Giovanna Carofiglio · Distinguished Engineer & Senior Director, Outshift by Cisco
See your cost savings with Luna‑2 SLM judges.
Drop in your trace volume and your judge model. The calculator returns what you'd save.
Train where your data lives.
Custom AI judges. Inside your environment. In days.
* vs. GPT-4.1 used as a judge at equivalent task quality on the Luna 2 paper benchmark suite. Comparisons against other frontier judges may yield different ratios.
