After each evaluation run, Rogue provides a comprehensive set of results to help you understand your agent’s performance.

Live Observability

You can watch the entire evaluation process unfold in real-time through a chat interface. This allows you to see the exact interaction between the Evaluator Agent and your agent, providing immediate insights into its behavior.

Evaluation Report

Once the run is complete, a detailed report is generated. This report is available in two formats:

  1. UI Report: A user-friendly interface that presents a summary of the findings, a list of all test scenarios with their pass/fail status, and a full transcript of the conversation for each scenario.
  2. JSON Export: A machine-readable JSON file containing all the raw data from the evaluation. This is useful for integrating with other tools or for custom analysis.

Key Sections of the Report

  • Summary: An overview of the evaluation, including the overall pass rate and a list of any failed scenarios.
  • Scenario Details: For each test case, you can see the initial prompt, the agent’s response, the Evaluator Agent’s assessment, and whether the test passed or failed.
  • Conversation Transcript: A complete log of the messages exchanged between the agents.

By combining live observability with detailed reporting, Rogue gives you the visibility you need to confidently assess and improve your AI agent.