Evals: Test and Validate Your AI Agents with Confidence
You can now thoroughly test your AI agents with our powerful Evals feature, ensuring they perform exactly as expected in real-world scenarios!
Evals provides a structured testing framework for your AI agents, allowing you to create test suites with multiple scenarios to validate functionality. Create golden test examples, simulate user interactions, and evaluate agent performance without triggering actual tool executions or external actions.
➡️ Create organized test suites – Group related tests together for better organization and management
➡️ Generate realistic test scenarios – Automatically create diverse user interactions to thoroughly test your agents
➡️ Evaluate with AI judges – Use LLM-powered evaluators to assess if your agent meets specific criteria
➡️ Track performance over time – Monitor how your agents perform across multiple test runs
➡️ Coming soon: Tool simulations – Test workflows with sensitive or expensive tools without triggering actual executions
With Evals, you can confidently deploy agents knowing they've been thoroughly tested against a variety of scenarios, ensuring reliability and consistent performance for your users.
Available exclusively for select Enterprise customers, Evals delivers the advanced testing capabilities needed for mission-critical AI deployments. To access Evals, enterprise users can go to their agent's dashboard and select the "Monitor" tab, then navigate to the "Test Suites" section. If you’re on an Enterprise plan and don’t yet have access, contact your Relevance AI sales representative to register interest. This feature is not available on non-Enterprise plans.
Start building more reliable AI agents with comprehensive enterprise-grade testing today!
General fixes and UI improvements