Today, we're thrilled to announce a significant milestone in Galileo's journey: our $45 million Series B funding round. This investment will propel our Evaluation Intelligence Platform to new heights, enabling more accurate and trustworthy AI for teams across the globe including current customers and partners such as Twilio, Comcast, HP, and ServiceTitan.
This funding round, led by Scale Venture Partners, with participation from Databricks Ventures, Premji Invest, Amex Ventures, Citi Ventures, ServiceNow, and SentinelOne, comes on the heels of extraordinary growth for Galileo. Since the beginning of 2024, we’ve grown revenue by 834%, quadrupled our number of enterprise customers, and brought on six Fortune 50 companies.
We're also honored to welcome AI leaders Clement Delangue (CEO, Hugging Face) and Ankit Sobti (CTO, Postman) to the Galileo family alongside our existing investors, including Battery Ventures, Walden Capital, and Factory, who have continued their support in this funding round.
As generative AI adoption skyrockets across enterprises globally, we’re witnessing a democratization of AI capabilities. What was once a field reserved for specialized machine learning engineers and data scientists is now accessible to over 30 million software engineers. However, this rapid adoption coupled with generative AI’s non-deterministic nature exposes one of the industry’s greatest challenges: the lack of robust testing and measurement for AI accuracy, performance, and safety. As enterprises race forward with generative AI—adopting more advanced LLMs and more complex frameworks like RAG and agentic workflows, the measurement problem only becomes more pressing.
From Day 1, we have been focused on solving AI’s measurement problem. While leading AI efforts at Google AI, Google Brain, and Uber AI, we bonded over the lack of effective tooling and metrics to measure the quality of our models and training data. If Google and Uber hadn’t solved this problem, who would? We identified three key challenges with AI measurement:
Instilling trust in the next generation of AI would require the next generation of AI evaluation. With these challenges in mind, we founded Galileo 3.5 years ago to enable builders to fully harness the potential of language models at scale.
To solve the AI Measurement Problem, we have developed our Evaluation Intelligence Platform - a solution that embeds accurate evaluations directly into the AI development workflow, empowering teams with unprecedented visibility and control. With Evaluation Intelligence, teams can rapidly develop, rigorously test, continuously monitor, and securely deploy AI systems at scale.
Our Evaluation Intelligence Platform is built on three foundational pillars.
A new AI development workflow has emerged. One that prioritizes experimentation and iteration. We’ve built a comprehensive suite of products that support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics, providing teams with a consistent and reliable measurement system that drives experimentation, facilitates debugging, and enhances collaboration.
You can’t solve the measurement problem without a robust system of measure. Enter the Luna Evaluation Suite - our answer to this critical need. Luna is a collection of high-performance evaluation models designed to be accurate, fast, and cost-effective. Capable of evaluating a wide range of factors, including hallucinations, retrieval efficacy, agent quality and more. Our evaluation models work out-of-the-box, requiring no ground truth data. This means teams can start getting valuable insights immediately without the time-consuming process of ‘test set’ preparation. Instead, Luna helps teams curate high quality Test Sets over time.
Every AI use case is different. That’s why we’ve developed our evaluation metrics to be highly adaptable:
By combining these three powerful components, Evaluation Intelligence provides a comprehensive solution to the AI measurement challenge across the entire AI application development lifecycle. It empowers teams to develop AI systems with confidence, knowing they have the tools to ensure accuracy, safety, and reliability at every step of the journey.
Building production-ready products in the AI era requires a new approach to testing and evaluation. With this new funding, we are excited to accelerate the development of our platform and bring the benefits of Evaluation Intelligence to engineering teams worldwide.
Want to learn more? We invite you to join us at GenAI Productionize 2.0 on October 29, where we’ll dig further into Evaluation Intelligence alongside world leaders in the field of AI, including leaders from Writer, Cohere, NVIDIA, Twilio, Databricks, Unstructured.io, CrewAI, and many more. Register now.
We are just getting started. If solving AI’s measurement problem speaks to you, we’re looking for ambitious builders to join the movement!