Live Observability
You can watch the entire evaluation process unfold in real-time through a chat interface. This allows you to see the exact interaction between theEvaluator Agent
and your agent, providing immediate insights into its behavior.
Evaluation Report
Once the run is complete, a detailed report is generated. This report is available in two formats:- UI Report: A user-friendly interface that presents a summary of the findings, a list of all test scenarios with their pass/fail status, and a full transcript of the conversation for each scenario.
- JSON Export: A machine-readable JSON file containing all the raw data from the evaluation. This is useful for integrating with other tools or for custom analysis.
Key Sections of the Report
- Summary: An overview of the evaluation, including the overall pass rate and a list of any failed scenarios.
- Scenario Details: For each test case, you can see the initial prompt, the agent’s response, the
Evaluator Agent
’s assessment, and whether the test passed or failed. - Conversation Transcript: A complete log of the messages exchanged between the agents.