Evaluating Multi-Agent Systems with CrewAI + Galileo

Atindriyo Sanyal

Co-founder and CPO

Multi-agent systems promise specialized expertise and parallel processing, but they are hard to debug. When using CrewAI's multi-agent framework, comprehensive evaluations are necessary to ensure multi-agent systems are consistently reliable in production.

Join this workshop, where Galileo Product Manager Xian Ke, and CrewAI’s Director of Product Marketing, Shane Johnson, will demonstrate how to evaluate multi-agent systems that actually work in production. You'll learn a practical, metric-driven approach to preventing failures by instrumenting the agent to monitor action completion, tool selection, latency, and user satisfaction.

We'll walk through a real-world CrewAI implementation and how observability enables root cause analysis and systematic fixes. You'll see exactly where agents lose context during handoffs, when tool selection breaks down, and how to streamline your architecture.

Watch our webinar to learn:

An AI eval playbook purpose-built for multi-agent challenges
How to trace root causes across agent handoffs with session, step, and system-level metrics
How to use CrewAI’s orchestration framework with Galileo's observability platform to create reliable multi-agent systems

Atindriyo Sanyal