Announcing our Series B, Evaluation Intelligence Platform

Today, we're thrilled to announce a significant milestone in Galileo's journey: our $45 million Series B funding round. This investment will propel our Evaluation Intelligence Platform to new heights, enabling more accurate and trustworthy AI for teams across the globe including current customers and partners such as Twilio, Comcast, HP, and ServiceTitan.

This funding round, led by Scale Venture Partners, with participation from Databricks Ventures, Premji Invest, Amex Ventures, Citi Ventures, ServiceNow, and SentinelOne, comes on the heels of extraordinary growth for Galileo. Since the beginning of 2024, we’ve grown revenue by 834%, quadrupled our number of enterprise customers, and brought on six Fortune 50 companies.

We're also honored to welcome AI leaders Clement Delangue (CEO, Hugging Face) and Ankit Sobti (CTO, Postman) to the Galileo family alongside our existing investors, including Battery Ventures, Walden Capital, and Factory, who have continued their support in this funding round.

Galileo is the leading AI evaluation and observability platform with 834% ARR growth in 2024, backed by world-class investors and partners with $68M raised to date, and enterprise adoption with 400% growth in new customers and 6 Fortune 50 companies.

Solving AI’s Measurement Problem Since 2021.

As generative AI adoption skyrockets across enterprises globally, we’re witnessing a democratization of AI capabilities. What was once a field reserved for specialized machine learning engineers and data scientists is now accessible to over 30 million software engineers. However, this rapid adoption coupled with generative AI’s non-deterministic nature exposes one of the industry’s greatest challenges: the lack of robust testing and measurement for AI accuracy, performance, and safety. As enterprises race forward with generative AI—adopting more advanced LLMs and more complex frameworks like RAG and agentic workflows, the measurement problem only becomes more pressing.

From Day 1, we have been focused on solving AI’s measurement problem. While leading AI efforts at Google AI, Google Brain, and Uber AI, we bonded over the lack of effective tooling and metrics to measure the quality of our models and training data. If Google and Uber hadn’t solved this problem, who would? We identified three key challenges with AI measurement:

AI for language is poorly measured: There were no tools, metrics, or frameworks to effectively measure AI quality. The best solution was to simply throw humans at the problem. But as we all know, this is incredibly slow, expensive, and error-prone. AI labeling companies would make a fortune, while AI developers were left with a large number of applications in a long-running POC graveyard.
AI measurement has a last-mile problem: Every use case is unique and requires unique measurements.
AI measurement has a scalability problem: Last but not least, existing measurement techniques could not scale to production throughput.

Instilling trust in the next generation of AI would require the next generation of AI evaluation. With these challenges in mind, we founded Galileo 3.5 years ago to enable builders to fully harness the potential of language models at scale.

Introducing Evaluation Intelligence

To solve the AI Measurement Problem, we have developed our Evaluation Intelligence Platform - a solution that embeds accurate evaluations directly into the AI development workflow, empowering teams with unprecedented visibility and control. With Evaluation Intelligence, teams can rapidly develop, rigorously test, continuously monitor, and securely deploy AI systems at scale.

Our Evaluation Intelligence Platform is built on three foundational pillars.

End-to-end support for the new AI development workflow

A new AI development workflow has emerged. One that prioritizes experimentation and iteration. We’ve built a comprehensive suite of products that support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics, providing teams with a consistent and reliable measurement system that drives experimentation, facilitates debugging, and enhances collaboration.

Galileo's Evaluation Intelligence Platform includes modules for Fine-Tune, Evaluate, Observe, and Protect.

Evaluation metrics that just work

You can’t solve the measurement problem without a robust system of measure. Enter the Luna Evaluation Suite - our answer to this critical need. Luna is a collection of high-performance evaluation models designed to be accurate, fast, and cost-effective. Capable of evaluating a wide range of factors, including hallucinations, retrieval efficacy, agent quality and more. Our evaluation models work out-of-the-box, requiring no ground truth data. This means teams can start getting valuable insights immediately without the time-consuming process of ‘test set’ preparation. Instead, Luna helps teams curate high quality Test Sets over time.

Luna Evaluation Suite metrics powered by leading AI research

Infrastructure and metrics that help you scale

Every AI use case is different. That’s why we’ve developed our evaluation metrics to be highly adaptable:

Auto-adaptive: Our metrics evolve based on human feedback, fine-tuning themselves to your specific use case.
Scalability: Some of our customers need to compute dozens of evaluations across millions of user queries per day. Whether you're handling thousands or millions of queries, Galileo Wizard – a scalable inference engine specifically designed for evaluation – ensures that you can compute evaluation metrics without compromising on speed, accuracy, or budget.
Customization: As your AI applications grow and diversify, so must your evaluation models. Our Evaluation Intelligence Platform makes it easy to fine-tune and optimize your evaluation methods over time, providing tailored insights that drive continuous improvement.

By combining these three powerful components, Evaluation Intelligence provides a comprehensive solution to the AI measurement challenge across the entire AI application development lifecycle. It empowers teams to develop AI systems with confidence, knowing they have the tools to ensure accuracy, safety, and reliability at every step of the journey.

Join Us to Solve AI Evaluation!

Building production-ready products in the AI era requires a new approach to testing and evaluation. With this new funding, we are excited to accelerate the development of our platform and bring the benefits of Evaluation Intelligence to engineering teams worldwide.

Want to learn more? We invite you to join us at GenAI Productionize 2.0 on October 29, where we’ll dig further into Evaluation Intelligence alongside world leaders in the field of AI, including leaders from Writer, Cohere, NVIDIA, Twilio, Databricks, Unstructured.io, CrewAI, and many more. You can watch all the sessions after the fact here

We are just getting started. If solving AI’s measurement problem speaks to you, we’re looking for ambitious builders to join the movement!

Today, we're thrilled to announce a significant milestone in Galileo's journey: our $45 million Series B funding round. This investment will propel our Evaluation Intelligence Platform to new heights, enabling more accurate and trustworthy AI for teams across the globe including current customers and partners such as Twilio, Comcast, HP, and ServiceTitan.

This funding round, led by Scale Venture Partners, with participation from Databricks Ventures, Premji Invest, Amex Ventures, Citi Ventures, ServiceNow, and SentinelOne, comes on the heels of extraordinary growth for Galileo. Since the beginning of 2024, we’ve grown revenue by 834%, quadrupled our number of enterprise customers, and brought on six Fortune 50 companies.

We're also honored to welcome AI leaders Clement Delangue (CEO, Hugging Face) and Ankit Sobti (CTO, Postman) to the Galileo family alongside our existing investors, including Battery Ventures, Walden Capital, and Factory, who have continued their support in this funding round.

Solving AI’s Measurement Problem Since 2021.

As generative AI adoption skyrockets across enterprises globally, we’re witnessing a democratization of AI capabilities. What was once a field reserved for specialized machine learning engineers and data scientists is now accessible to over 30 million software engineers. However, this rapid adoption coupled with generative AI’s non-deterministic nature exposes one of the industry’s greatest challenges: the lack of robust testing and measurement for AI accuracy, performance, and safety. As enterprises race forward with generative AI—adopting more advanced LLMs and more complex frameworks like RAG and agentic workflows, the measurement problem only becomes more pressing.

From Day 1, we have been focused on solving AI’s measurement problem. While leading AI efforts at Google AI, Google Brain, and Uber AI, we bonded over the lack of effective tooling and metrics to measure the quality of our models and training data. If Google and Uber hadn’t solved this problem, who would? We identified three key challenges with AI measurement:

AI for language is poorly measured: There were no tools, metrics, or frameworks to effectively measure AI quality. The best solution was to simply throw humans at the problem. But as we all know, this is incredibly slow, expensive, and error-prone. AI labeling companies would make a fortune, while AI developers were left with a large number of applications in a long-running POC graveyard.
AI measurement has a last-mile problem: Every use case is unique and requires unique measurements.
AI measurement has a scalability problem: Last but not least, existing measurement techniques could not scale to production throughput.

Instilling trust in the next generation of AI would require the next generation of AI evaluation. With these challenges in mind, we founded Galileo 3.5 years ago to enable builders to fully harness the potential of language models at scale.

Introducing Evaluation Intelligence

To solve the AI Measurement Problem, we have developed our Evaluation Intelligence Platform - a solution that embeds accurate evaluations directly into the AI development workflow, empowering teams with unprecedented visibility and control. With Evaluation Intelligence, teams can rapidly develop, rigorously test, continuously monitor, and securely deploy AI systems at scale.

Our Evaluation Intelligence Platform is built on three foundational pillars.

End-to-end support for the new AI development workflow

A new AI development workflow has emerged. One that prioritizes experimentation and iteration. We’ve built a comprehensive suite of products that support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics, providing teams with a consistent and reliable measurement system that drives experimentation, facilitates debugging, and enhances collaboration.

Evaluation metrics that just work

You can’t solve the measurement problem without a robust system of measure. Enter the Luna Evaluation Suite - our answer to this critical need. Luna is a collection of high-performance evaluation models designed to be accurate, fast, and cost-effective. Capable of evaluating a wide range of factors, including hallucinations, retrieval efficacy, agent quality and more. Our evaluation models work out-of-the-box, requiring no ground truth data. This means teams can start getting valuable insights immediately without the time-consuming process of ‘test set’ preparation. Instead, Luna helps teams curate high quality Test Sets over time.

Infrastructure and metrics that help you scale

Every AI use case is different. That’s why we’ve developed our evaluation metrics to be highly adaptable:

Auto-adaptive: Our metrics evolve based on human feedback, fine-tuning themselves to your specific use case.
Scalability: Some of our customers need to compute dozens of evaluations across millions of user queries per day. Whether you're handling thousands or millions of queries, Galileo Wizard – a scalable inference engine specifically designed for evaluation – ensures that you can compute evaluation metrics without compromising on speed, accuracy, or budget.
Customization: As your AI applications grow and diversify, so must your evaluation models. Our Evaluation Intelligence Platform makes it easy to fine-tune and optimize your evaluation methods over time, providing tailored insights that drive continuous improvement.

By combining these three powerful components, Evaluation Intelligence provides a comprehensive solution to the AI measurement challenge across the entire AI application development lifecycle. It empowers teams to develop AI systems with confidence, knowing they have the tools to ensure accuracy, safety, and reliability at every step of the journey.

Join Us to Solve AI Evaluation!

Building production-ready products in the AI era requires a new approach to testing and evaluation. With this new funding, we are excited to accelerate the development of our platform and bring the benefits of Evaluation Intelligence to engineering teams worldwide.

Want to learn more? We invite you to join us at GenAI Productionize 2.0 on October 29, where we’ll dig further into Evaluation Intelligence alongside world leaders in the field of AI, including leaders from Writer, Cohere, NVIDIA, Twilio, Databricks, Unstructured.io, CrewAI, and many more. You can watch all the sessions after the fact here

We are just getting started. If solving AI’s measurement problem speaks to you, we’re looking for ambitious builders to join the movement!

Today, we're thrilled to announce a significant milestone in Galileo's journey: our $45 million Series B funding round. This investment will propel our Evaluation Intelligence Platform to new heights, enabling more accurate and trustworthy AI for teams across the globe including current customers and partners such as Twilio, Comcast, HP, and ServiceTitan.

This funding round, led by Scale Venture Partners, with participation from Databricks Ventures, Premji Invest, Amex Ventures, Citi Ventures, ServiceNow, and SentinelOne, comes on the heels of extraordinary growth for Galileo. Since the beginning of 2024, we’ve grown revenue by 834%, quadrupled our number of enterprise customers, and brought on six Fortune 50 companies.

We're also honored to welcome AI leaders Clement Delangue (CEO, Hugging Face) and Ankit Sobti (CTO, Postman) to the Galileo family alongside our existing investors, including Battery Ventures, Walden Capital, and Factory, who have continued their support in this funding round.

Solving AI’s Measurement Problem Since 2021.

As generative AI adoption skyrockets across enterprises globally, we’re witnessing a democratization of AI capabilities. What was once a field reserved for specialized machine learning engineers and data scientists is now accessible to over 30 million software engineers. However, this rapid adoption coupled with generative AI’s non-deterministic nature exposes one of the industry’s greatest challenges: the lack of robust testing and measurement for AI accuracy, performance, and safety. As enterprises race forward with generative AI—adopting more advanced LLMs and more complex frameworks like RAG and agentic workflows, the measurement problem only becomes more pressing.

From Day 1, we have been focused on solving AI’s measurement problem. While leading AI efforts at Google AI, Google Brain, and Uber AI, we bonded over the lack of effective tooling and metrics to measure the quality of our models and training data. If Google and Uber hadn’t solved this problem, who would? We identified three key challenges with AI measurement:

AI for language is poorly measured: There were no tools, metrics, or frameworks to effectively measure AI quality. The best solution was to simply throw humans at the problem. But as we all know, this is incredibly slow, expensive, and error-prone. AI labeling companies would make a fortune, while AI developers were left with a large number of applications in a long-running POC graveyard.
AI measurement has a last-mile problem: Every use case is unique and requires unique measurements.
AI measurement has a scalability problem: Last but not least, existing measurement techniques could not scale to production throughput.

Instilling trust in the next generation of AI would require the next generation of AI evaluation. With these challenges in mind, we founded Galileo 3.5 years ago to enable builders to fully harness the potential of language models at scale.

Introducing Evaluation Intelligence

To solve the AI Measurement Problem, we have developed our Evaluation Intelligence Platform - a solution that embeds accurate evaluations directly into the AI development workflow, empowering teams with unprecedented visibility and control. With Evaluation Intelligence, teams can rapidly develop, rigorously test, continuously monitor, and securely deploy AI systems at scale.

Our Evaluation Intelligence Platform is built on three foundational pillars.

End-to-end support for the new AI development workflow

A new AI development workflow has emerged. One that prioritizes experimentation and iteration. We’ve built a comprehensive suite of products that support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics, providing teams with a consistent and reliable measurement system that drives experimentation, facilitates debugging, and enhances collaboration.

Evaluation metrics that just work

You can’t solve the measurement problem without a robust system of measure. Enter the Luna Evaluation Suite - our answer to this critical need. Luna is a collection of high-performance evaluation models designed to be accurate, fast, and cost-effective. Capable of evaluating a wide range of factors, including hallucinations, retrieval efficacy, agent quality and more. Our evaluation models work out-of-the-box, requiring no ground truth data. This means teams can start getting valuable insights immediately without the time-consuming process of ‘test set’ preparation. Instead, Luna helps teams curate high quality Test Sets over time.

Infrastructure and metrics that help you scale

Every AI use case is different. That’s why we’ve developed our evaluation metrics to be highly adaptable:

Auto-adaptive: Our metrics evolve based on human feedback, fine-tuning themselves to your specific use case.
Scalability: Some of our customers need to compute dozens of evaluations across millions of user queries per day. Whether you're handling thousands or millions of queries, Galileo Wizard – a scalable inference engine specifically designed for evaluation – ensures that you can compute evaluation metrics without compromising on speed, accuracy, or budget.
Customization: As your AI applications grow and diversify, so must your evaluation models. Our Evaluation Intelligence Platform makes it easy to fine-tune and optimize your evaluation methods over time, providing tailored insights that drive continuous improvement.

By combining these three powerful components, Evaluation Intelligence provides a comprehensive solution to the AI measurement challenge across the entire AI application development lifecycle. It empowers teams to develop AI systems with confidence, knowing they have the tools to ensure accuracy, safety, and reliability at every step of the journey.

Join Us to Solve AI Evaluation!

Building production-ready products in the AI era requires a new approach to testing and evaluation. With this new funding, we are excited to accelerate the development of our platform and bring the benefits of Evaluation Intelligence to engineering teams worldwide.

Want to learn more? We invite you to join us at GenAI Productionize 2.0 on October 29, where we’ll dig further into Evaluation Intelligence alongside world leaders in the field of AI, including leaders from Writer, Cohere, NVIDIA, Twilio, Databricks, Unstructured.io, CrewAI, and many more. You can watch all the sessions after the fact here

We are just getting started. If solving AI’s measurement problem speaks to you, we’re looking for ambitious builders to join the movement!

Today, we're thrilled to announce a significant milestone in Galileo's journey: our $45 million Series B funding round. This investment will propel our Evaluation Intelligence Platform to new heights, enabling more accurate and trustworthy AI for teams across the globe including current customers and partners such as Twilio, Comcast, HP, and ServiceTitan.

This funding round, led by Scale Venture Partners, with participation from Databricks Ventures, Premji Invest, Amex Ventures, Citi Ventures, ServiceNow, and SentinelOne, comes on the heels of extraordinary growth for Galileo. Since the beginning of 2024, we’ve grown revenue by 834%, quadrupled our number of enterprise customers, and brought on six Fortune 50 companies.

We're also honored to welcome AI leaders Clement Delangue (CEO, Hugging Face) and Ankit Sobti (CTO, Postman) to the Galileo family alongside our existing investors, including Battery Ventures, Walden Capital, and Factory, who have continued their support in this funding round.

Solving AI’s Measurement Problem Since 2021.

As generative AI adoption skyrockets across enterprises globally, we’re witnessing a democratization of AI capabilities. What was once a field reserved for specialized machine learning engineers and data scientists is now accessible to over 30 million software engineers. However, this rapid adoption coupled with generative AI’s non-deterministic nature exposes one of the industry’s greatest challenges: the lack of robust testing and measurement for AI accuracy, performance, and safety. As enterprises race forward with generative AI—adopting more advanced LLMs and more complex frameworks like RAG and agentic workflows, the measurement problem only becomes more pressing.

From Day 1, we have been focused on solving AI’s measurement problem. While leading AI efforts at Google AI, Google Brain, and Uber AI, we bonded over the lack of effective tooling and metrics to measure the quality of our models and training data. If Google and Uber hadn’t solved this problem, who would? We identified three key challenges with AI measurement:

AI for language is poorly measured: There were no tools, metrics, or frameworks to effectively measure AI quality. The best solution was to simply throw humans at the problem. But as we all know, this is incredibly slow, expensive, and error-prone. AI labeling companies would make a fortune, while AI developers were left with a large number of applications in a long-running POC graveyard.
AI measurement has a last-mile problem: Every use case is unique and requires unique measurements.
AI measurement has a scalability problem: Last but not least, existing measurement techniques could not scale to production throughput.

Instilling trust in the next generation of AI would require the next generation of AI evaluation. With these challenges in mind, we founded Galileo 3.5 years ago to enable builders to fully harness the potential of language models at scale.

Introducing Evaluation Intelligence

To solve the AI Measurement Problem, we have developed our Evaluation Intelligence Platform - a solution that embeds accurate evaluations directly into the AI development workflow, empowering teams with unprecedented visibility and control. With Evaluation Intelligence, teams can rapidly develop, rigorously test, continuously monitor, and securely deploy AI systems at scale.

Our Evaluation Intelligence Platform is built on three foundational pillars.

End-to-end support for the new AI development workflow

A new AI development workflow has emerged. One that prioritizes experimentation and iteration. We’ve built a comprehensive suite of products that support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics, providing teams with a consistent and reliable measurement system that drives experimentation, facilitates debugging, and enhances collaboration.

Evaluation metrics that just work

You can’t solve the measurement problem without a robust system of measure. Enter the Luna Evaluation Suite - our answer to this critical need. Luna is a collection of high-performance evaluation models designed to be accurate, fast, and cost-effective. Capable of evaluating a wide range of factors, including hallucinations, retrieval efficacy, agent quality and more. Our evaluation models work out-of-the-box, requiring no ground truth data. This means teams can start getting valuable insights immediately without the time-consuming process of ‘test set’ preparation. Instead, Luna helps teams curate high quality Test Sets over time.

Infrastructure and metrics that help you scale

Every AI use case is different. That’s why we’ve developed our evaluation metrics to be highly adaptable:

Auto-adaptive: Our metrics evolve based on human feedback, fine-tuning themselves to your specific use case.
Scalability: Some of our customers need to compute dozens of evaluations across millions of user queries per day. Whether you're handling thousands or millions of queries, Galileo Wizard – a scalable inference engine specifically designed for evaluation – ensures that you can compute evaluation metrics without compromising on speed, accuracy, or budget.
Customization: As your AI applications grow and diversify, so must your evaluation models. Our Evaluation Intelligence Platform makes it easy to fine-tune and optimize your evaluation methods over time, providing tailored insights that drive continuous improvement.

By combining these three powerful components, Evaluation Intelligence provides a comprehensive solution to the AI measurement challenge across the entire AI application development lifecycle. It empowers teams to develop AI systems with confidence, knowing they have the tools to ensure accuracy, safety, and reliability at every step of the journey.

Join Us to Solve AI Evaluation!

Building production-ready products in the AI era requires a new approach to testing and evaluation. With this new funding, we are excited to accelerate the development of our platform and bring the benefits of Evaluation Intelligence to engineering teams worldwide.

Want to learn more? We invite you to join us at GenAI Productionize 2.0 on October 29, where we’ll dig further into Evaluation Intelligence alongside world leaders in the field of AI, including leaders from Writer, Cohere, NVIDIA, Twilio, Databricks, Unstructured.io, CrewAI, and many more. You can watch all the sessions after the fact here

We are just getting started. If solving AI’s measurement problem speaks to you, we’re looking for ambitious builders to join the movement!

Back

Announcing our Series B, Evaluation Intelligence Platform

Solving AI’s Measurement Problem Since 2021.

Introducing Evaluation Intelligence

End-to-end support for the new AI development workflow

Evaluation metrics that just work

Infrastructure and metrics that help you scale

Join Us to Solve AI Evaluation!

Solving AI’s Measurement Problem Since 2021.

Introducing Evaluation Intelligence

End-to-end support for the new AI development workflow

Evaluation metrics that just work

Infrastructure and metrics that help you scale

Join Us to Solve AI Evaluation!

Solving AI’s Measurement Problem Since 2021.

Introducing Evaluation Intelligence

End-to-end support for the new AI development workflow

Evaluation metrics that just work

Infrastructure and metrics that help you scale

Join Us to Solve AI Evaluation!

Solving AI’s Measurement Problem Since 2021.

Introducing Evaluation Intelligence

End-to-end support for the new AI development workflow

Evaluation metrics that just work

Infrastructure and metrics that help you scale

Join Us to Solve AI Evaluation!

If you find this helpful and interesting,