Rogue - The AI Agent Evaluator

Rogue is a powerful tool designed to evaluate the performance, compliance, and reliability of AI agents. It pits a dynamic EvaluatorAgent against your agent using Google’s A2A protocol, testing it with a range of scenarios to ensure it behaves exactly as intended.

Star

Architecture

Rogue operates on a client-server architecture:

Rogue Server: Contains the core evaluation logic
Client Interfaces: Multiple interfaces that connect to the server:
- TUI (Terminal UI): Modern terminal interface built with Go and Bubble Tea
- Web UI: Gradio-based web interface
- CLI: Command-line interface for automated evaluation and CI/CD

This architecture allows for flexible deployment and usage patterns, where the server can run independently and multiple clients can connect to it simultaneously.

Key Features

🔄 Dynamic Scenario Generation: Automatically creates a comprehensive test suite from your high-level business context.
👀 Live Evaluation Monitoring: Watch the interaction between the Evaluator and your agent in a real-time chat interface.
📊 Comprehensive Reporting: Generates a detailed summary of the evaluation, including pass/fail rates, key findings, and recommendations.
🔍 Multi-Faceted Testing: Natively supports testing for policy compliance, with a flexible framework to expand to other areas like prompt injection or safety.
🤖 Broad Model Support: Compatible with a wide range of models from providers like OpenAI, Google (Gemini), and Anthropic.
🎯 Multiple Interfaces: Choose from TUI, Web UI, or CLI interfaces depending on your workflow needs.
🚀 Easy Installation: Get started quickly with uvx rogue-ai - no complex setup required.

Getting Started

How It Works

Concepts

Examples

Introduction

Rogue - The AI Agent Evaluator

Architecture

Key Features

Getting Started

How It Works

Concepts

Examples

​Rogue - The AI Agent Evaluator

​Architecture

​Key Features

Rogue - The AI Agent Evaluator

Architecture

Key Features