How It Works

Client-Server Architecture

Rogue operates on a client-server architecture that separates the core evaluation logic from the user interfaces:

Rogue Server: The backend that handles all evaluation logic, scenario generation, red teaming, and agent interactions
Multiple Client Interfaces: Different ways to interact with the server:
- TUI (Terminal UI): Modern terminal interface built with Go and Bubble Tea
- Web UI: Gradio-based web interface for browser-based interaction
- CLI: Command-line interface for automation and CI/CD pipelines

This architecture allows for flexible deployment patterns where the server can run independently, and multiple clients can connect simultaneously.

Two Evaluation Modes

Rogue offers two complementary evaluation modes:

1. Policy Evaluation

Tests whether your agent follows its intended business logic and policies. Workflow:

Configure: Provide agent endpoint, authentication, and LLM settings
Generate Scenarios: Input business context to auto-generate test scenarios
Run & Evaluate: EvaluatorAgent conducts conversations for each scenario
View Report: Get a summary of policy compliance with pass/fail rates

2. Red Team Security Testing

Tests your agent’s resistance to adversarial attacks and security vulnerabilities. Workflow:

Configure: Select scan type (Basic, Full, or Custom) and target vulnerabilities
Attack Execution: Red Team Orchestrator applies 30+ attack techniques
Evaluate Responses: LLM judges detect successful exploits
Risk Assessment: Calculate CVSS-based scores and map to compliance frameworks

┌─────────────────────────────────────────────────────────────┐
│                   Red Team Workflow                          │
├─────────────────────────────────────────────────────────────┤
│  Select Vulnerabilities → Apply Attacks → Evaluate Results  │
│         ↓                      ↓                   ↓        │
│  87+ vulnerability types   30+ attack       CVSS scoring    │
│  13 categories            techniques      Framework mapping  │
└─────────────────────────────────────────────────────────────┘

Red Team Scan Types

Scan Type	Vulnerabilities	Attacks	Use Case
Basic	10 (Prompt + PII)	5 free	Quick security check
Full	87+ (all categories)	30+ (all)	Comprehensive audit
Custom	User-selected	User-selected	Targeted testing

Evaluation Workflow (Policy)

Configure: You provide the endpoint and authentication details for the agent you want to test, and select the LLMs you want Rogue to use for its services (scenario generation, judging).
Generate Scenarios: You input the “business context” or a high-level description of what your agent is supposed to do. Rogue’s LLM Service uses this context to generate a list of relevant test scenarios. You can review and edit these scenarios.
Run & Evaluate: You start the evaluation. The Scenario Evaluation Service spins up the EvaluatorAgent, which begins a conversation with your agent for each scenario. You can watch this conversation happen live through the TUI or Web UI.
View Report: Once all scenarios are complete, the LLM Service analyzes the results and generates a Markdown-formatted report, giving you a clear summary of your agent’s performance.

Red Team Workflow

Select Scan Type: Choose Basic (free), Full (premium), or Custom
Configure Vulnerabilities: Select from 87+ vulnerability types across 13 categories
Select Attacks: Choose from 30+ attack techniques (single-turn, multi-turn, agentic)
Run Red Team: Orchestrator systematically tests each vulnerability
Evaluate Results: LLM judges determine if attacks succeeded
Calculate Risk: CVSS-based scoring with severity levels
Generate Report: Compliance mapping to OWASP, MITRE, NIST, and more

Interface Options

Default Mode: uvx rogue-ai starts both server and TUI for immediate use
Web UI Mode: uvx rogue-ai ui for browser-based interaction (requires server running)
CLI Mode: uvx rogue-ai cli for automated testing and CI/CD integration
Server Only: uvx rogue-ai server to run just the backend for custom integrations

Getting Started

Protocols & Transports

Configuration

Policy Evaluation

Red Teaming

Examples

How It Works

Client-Server Architecture

Two Evaluation Modes

1. Policy Evaluation

2. Red Team Security Testing

Red Team Scan Types

Evaluation Workflow (Policy)

Red Team Workflow

Interface Options

Getting Started

How It Works

Protocols & Transports

Configuration

Policy Evaluation

Red Teaming

Examples

​Client-Server Architecture

​Two Evaluation Modes

​1. Policy Evaluation

​2. Red Team Security Testing

​Red Team Scan Types

​Evaluation Workflow (Policy)

​Red Team Workflow

​Interface Options

Client-Server Architecture

Two Evaluation Modes

1. Policy Evaluation

2. Red Team Security Testing

Red Team Scan Types

Evaluation Workflow (Policy)

Red Team Workflow

Interface Options