Rogue - The AI Agent Evaluator

Rogue is a powerful, open-source tool designed to evaluate the performance, compliance, and reliability of AI agents. It pits a dynamic EvaluatorAgent against your agent using Google’s A2A protocol, testing it with a range of scenarios to ensure it behaves exactly as intended.

Key Features

  • πŸ”„ Dynamic Scenario Generation: Automatically creates a comprehensive test suite from your high-level business context.
  • πŸ‘€ Live Evaluation Observability: Watch the interaction between the Evaluator and your agent in a real-time chat interface.
  • πŸ“Š Comprehensive Reporting: Generates a detailed summary of the evaluation, including pass/fail rates, key findings, and recommendations.
  • πŸ” Multi-Faceted Testing: Natively supports testing for policy compliance, with a flexible framework to expand to other areas like prompt injection or safety.
  • πŸ€– Broad Model Support: Compatible with a wide range of models from providers like OpenAI, Google (Gemini), and Anthropic.
  • 🎯 User-Friendly Interface: A simple, step-by-step Gradio UI guides you through configuration, execution, and reporting.