Skip to main content

Client-Server Architecture

Rogue operates on a client-server architecture that separates the core evaluation logic from the user interfaces:
  • Rogue Server: The backend that handles all evaluation logic, scenario generation, red teaming, and agent interactions
  • Multiple Client Interfaces: Different ways to interact with the server:
    • TUI (Terminal UI): Modern terminal interface built with Go and Bubble Tea
    • Web UI: Gradio-based web interface for browser-based interaction
    • CLI: Command-line interface for automation and CI/CD pipelines
This architecture allows for flexible deployment patterns where the server can run independently, and multiple clients can connect simultaneously.

Two Evaluation Modes

Rogue offers two complementary evaluation modes:

1. Policy Evaluation

Tests whether your agent follows its intended business logic and policies. Workflow:
  1. Configure: Provide agent endpoint, authentication, and LLM settings
  2. Generate Scenarios: Input business context to auto-generate test scenarios
  3. Run & Evaluate: EvaluatorAgent conducts conversations for each scenario
  4. View Report: Get a summary of policy compliance with pass/fail rates

2. Red Team Security Testing

Tests your agent’s resistance to adversarial attacks and security vulnerabilities. Workflow:
  1. Configure: Select scan type (Basic, Full, or Custom) and target vulnerabilities
  2. Attack Execution: Red Team Orchestrator applies 30+ attack techniques
  3. Evaluate Responses: LLM judges detect successful exploits
  4. Risk Assessment: Calculate CVSS-based scores and map to compliance frameworks
┌─────────────────────────────────────────────────────────────┐
│                   Red Team Workflow                          │
├─────────────────────────────────────────────────────────────┤
│  Select Vulnerabilities → Apply Attacks → Evaluate Results  │
│         ↓                      ↓                   ↓        │
│  87+ vulnerability types   30+ attack       CVSS scoring    │
│  13 categories            techniques      Framework mapping  │
└─────────────────────────────────────────────────────────────┘

Red Team Scan Types

Scan TypeVulnerabilitiesAttacksUse Case
Basic10 (Prompt + PII)5 freeQuick security check
Full87+ (all categories)30+ (all)Comprehensive audit
CustomUser-selectedUser-selectedTargeted testing

Evaluation Workflow (Policy)

  1. Configure: You provide the endpoint and authentication details for the agent you want to test, and select the LLMs you want Rogue to use for its services (scenario generation, judging).
  2. Generate Scenarios: You input the “business context” or a high-level description of what your agent is supposed to do. Rogue’s LLM Service uses this context to generate a list of relevant test scenarios. You can review and edit these scenarios.
  3. Run & Evaluate: You start the evaluation. The Scenario Evaluation Service spins up the EvaluatorAgent, which begins a conversation with your agent for each scenario. You can watch this conversation happen live through the TUI or Web UI.
  4. View Report: Once all scenarios are complete, the LLM Service analyzes the results and generates a Markdown-formatted report, giving you a clear summary of your agent’s performance.

Red Team Workflow

  1. Select Scan Type: Choose Basic (free), Full (premium), or Custom
  2. Configure Vulnerabilities: Select from 87+ vulnerability types across 13 categories
  3. Select Attacks: Choose from 30+ attack techniques (single-turn, multi-turn, agentic)
  4. Run Red Team: Orchestrator systematically tests each vulnerability
  5. Evaluate Results: LLM judges determine if attacks succeeded
  6. Calculate Risk: CVSS-based scoring with severity levels
  7. Generate Report: Compliance mapping to OWASP, MITRE, NIST, and more

Interface Options

  • Default Mode: uvx rogue-ai starts both server and TUI for immediate use
  • Web UI Mode: uvx rogue-ai ui for browser-based interaction (requires server running)
  • CLI Mode: uvx rogue-ai cli for automated testing and CI/CD integration
  • Server Only: uvx rogue-ai server to run just the backend for custom integrations