Understand the workflow of the Rogue evaluation process.
LLM Service
uses this context to generate a list of relevant test scenarios. You can review and edit these scenarios.
Scenario Evaluation Service
spins up the EvaluatorAgent
, which begins a conversation with your agent for each scenario. You can watch this conversation happen live.
LLM Service
analyzes the results and generates a Markdown-formatted report, giving you a clear summary of your agent’s performance.