Client-Server Architecture
Rogue operates on a client-server architecture that separates the core evaluation logic from the user interfaces:- Rogue Server: The backend that handles all evaluation logic, scenario generation, red teaming, and agent interactions
- Multiple Client Interfaces: Different ways to interact with the server:
- TUI (Terminal UI): Modern terminal interface built with Go and Bubble Tea
- Web UI: Gradio-based web interface for browser-based interaction
- CLI: Command-line interface for automation and CI/CD pipelines
Two Evaluation Modes
Rogue offers two complementary evaluation modes:1. Policy Evaluation
Tests whether your agent follows its intended business logic and policies. Workflow:- Configure: Provide agent endpoint, authentication, and LLM settings
- Generate Scenarios: Input business context to auto-generate test scenarios
- Run & Evaluate: EvaluatorAgent conducts conversations for each scenario
- View Report: Get a summary of policy compliance with pass/fail rates
2. Red Team Security Testing
Tests your agent’s resistance to adversarial attacks and security vulnerabilities. Workflow:- Configure: Select scan type (Basic, Full, or Custom) and target vulnerabilities
- Attack Execution: Red Team Orchestrator applies 30+ attack techniques
- Evaluate Responses: LLM judges detect successful exploits
- Risk Assessment: Calculate CVSS-based scores and map to compliance frameworks
Red Team Scan Types
| Scan Type | Vulnerabilities | Attacks | Use Case |
|---|---|---|---|
| Basic | 10 (Prompt + PII) | 5 free | Quick security check |
| Full | 87+ (all categories) | 30+ (all) | Comprehensive audit |
| Custom | User-selected | User-selected | Targeted testing |
Evaluation Workflow (Policy)
- Configure: You provide the endpoint and authentication details for the agent you want to test, and select the LLMs you want Rogue to use for its services (scenario generation, judging).
-
Generate Scenarios: You input the “business context” or a high-level description of what your agent is supposed to do. Rogue’s
LLM Serviceuses this context to generate a list of relevant test scenarios. You can review and edit these scenarios. -
Run & Evaluate: You start the evaluation. The
Scenario Evaluation Servicespins up theEvaluatorAgent, which begins a conversation with your agent for each scenario. You can watch this conversation happen live through the TUI or Web UI. -
View Report: Once all scenarios are complete, the
LLM Serviceanalyzes the results and generates a Markdown-formatted report, giving you a clear summary of your agent’s performance.
Red Team Workflow
- Select Scan Type: Choose Basic (free), Full (premium), or Custom
- Configure Vulnerabilities: Select from 87+ vulnerability types across 13 categories
- Select Attacks: Choose from 30+ attack techniques (single-turn, multi-turn, agentic)
- Run Red Team: Orchestrator systematically tests each vulnerability
- Evaluate Results: LLM judges determine if attacks succeeded
- Calculate Risk: CVSS-based scoring with severity levels
- Generate Report: Compliance mapping to OWASP, MITRE, NIST, and more
Interface Options
- Default Mode:
uvx rogue-aistarts both server and TUI for immediate use - Web UI Mode:
uvx rogue-ai uifor browser-based interaction (requires server running) - CLI Mode:
uvx rogue-ai clifor automated testing and CI/CD integration - Server Only:
uvx rogue-ai serverto run just the backend for custom integrations