Skip to main content

Overview

Rogue’s red teaming engine tests AI agents against a comprehensive catalog of 87+ vulnerability types organized into 13 categories. Each vulnerability includes predefined attack techniques that are most effective for exploitation.

Vulnerability Categories

Tests for generation of harmful, inappropriate, or dangerous content.
IDNameDescriptionTier
hate-speechHate SpeechDiscriminatory content targeting protected groupsPremium
explicit-contentExplicit ContentSexual or adult content generationPremium
violent-crimeViolent Crime ContentViolence-promoting contentPremium
sexual-crimeSexual Crime ContentSexual crime content generationPremium
child-exploitationChild ExploitationCSAM and exploitation contentPremium
harassmentHarassmentBullying and harassment contentPremium
personal-attacksPersonal AttacksTargeted insults and attacksPremium
profanityProfanityInappropriate languagePremium
graphic-contentGraphic ContentViolent or disturbing imageryPremium
self-harmSelf-HarmSelf-harm and suicide contentPremium
dangerous-activityDangerous ActivityInstructions for dangerous actionsPremium
wordplayWordplayTricked into profanity via wordplayPremium
Tests for exposure of personally identifiable information.
IDNameDescriptionTier
pii-directPII via Direct ExposureDirect revelation of personal dataFree
pii-api-dbPII via API/DatabaseExposure through API/database accessFree
pii-sessionPII via Session DataLeakage through session informationFree
pii-socialPII via Social EngineeringExtraction through social engineeringPremium
cross-session-leakageCross-Session LeakageData leaking between user sessionsFree
privacy-violationPrivacy ViolationGeneral privacy violation attemptsFree
Tests for code injection and technical exploitation.
IDNameDescriptionTier
sql-injectionSQL InjectionSQL query injection via agentPremium
shell-injectionCommand InjectionShell command injectionPremium
ssrfSSRF VulnerabilityServer-side request forgeryPremium
debug-accessDebug Interface ExposureAccess to debug/admin functionsPremium
malicious-codeMalicious CodeGeneration of malware/exploitsPremium
Tests for discriminatory or biased responses.
IDNameDescriptionTier
bias-ageAge BiasAge-related discriminationPremium
bias-genderGender BiasGender-related discriminationPremium
bias-raceRace BiasRace-related discriminationPremium
bias-disabilityDisability BiasDisability-related discriminationPremium
bias-religionReligious BiasReligion-related discriminationPremium
bias-politicalPolitical BiasPolitical bias in responsesPremium
Tests for prompt manipulation and system compromise.
IDNameDescriptionTier
prompt-extractionSystem Prompt DisclosureExtraction of system promptsFree
prompt-overrideSystem Prompt OverrideOverriding system instructionsFree
indirect-injectionIndirect Prompt InjectionInjection via untrusted dataFree
ascii-smugglingASCII SmugglingUnicode tag-based smugglingFree
special-token-injectionSpecial Token InjectionToken-based injection attacksFree
pliny-injectionsPliny Prompt InjectionsSpecialized injection patternsPremium
context-complianceContext Compliance AttackFabricated history attacksPremium
Tests for authorization and permission bypass.
IDNameDescriptionTier
rbacRBAC ImplementationRole-based access control bypassPremium
bolaObject-Level AuthorizationBroken object authorization (OWASP API 1)Premium
bflaFunction-Level AuthorizationBroken function authorization (OWASP API 5)Premium
excessive-agencyExcessive AgencyUnauthorized actions beyond scopePremium
Tests for business rule violations and misalignment.
IDNameDescriptionTier
unauthorized-commitmentsUnauthorized CommitmentsMaking unauthorized promisesPremium
unauthorized-adviceUnauthorized AdviceProviding restricted advicePremium
competitorsCompetitorsPromoting competitor productsPremium
imitationEntity ImpersonationImpersonating other entitiesPremium
goal-misalignmentGoal MisalignmentOptimizing wrong objectivesPremium
off-topicOff-Topic ManipulationDeviation from intended purposePremium
hijackingResource HijackingUnauthorized resource usagePremium
Tests for IP violations and theft.
IDNameDescriptionTier
ip-violationsIP ViolationsCopyright and IP infringementPremium
ip-theftIP TheftTrade secret and IP theftPremium
Tests for factual accuracy and reliability.
IDNameDescriptionTier
hallucinationHallucinationFabrication of false informationPremium
unverifiable-claimsUnverifiable ClaimsClaims that cannot be verifiedPremium
misinformation-disinformationDisinformationFalse information campaignsPremium
overrelianceOverrelianceExcessive system assumption trustPremium
Tests for regulatory compliance violations.
IDNameDescriptionTier
coppaCOPPA ComplianceChildren’s privacy violationsPremium
ferpaFERPA ComplianceEducational records violationsPremium
Tests for critical and dangerous content.
IDNameDescriptionTier
cybercrimeCybercrimeHacking and cybercrime contentPremium
wmd-contentWMD ContentWeapons of mass destructionPremium
weapons-contentWeapons ContentWeapons manufacturingPremium
ied-contentIED ContentExplosive device instructionsPremium
drugsDrug-Related ContentIllegal drug informationPremium
methamphetamineMethamphetamine ContentMeth production/usePremium
extremismExtremist ContentExtremist recruitmentPremium
illegal-activityIllegal ActivityGeneral illegal activitiesPremium
non-violent-crimeNon-Violent CrimeNon-violent criminal actsPremium
Tests for AI agent architecture vulnerabilities.
IDNameDescriptionTier
memory-poisoningAgentic Memory PoisoningCorrupting agent memoryPremium
rag-poisoningRAG PoisoningPoisoning retrieval systemsPremium
rag-exfiltrationRAG Document ExfiltrationExtracting RAG documentsPremium
tool-discoveryTool DiscoveryEnumerating available toolsPremium
mcp-vulnerabilitiesModel Context ProtocolMCP-specific attacksPremium
Tests for resource exhaustion and denial of service.
IDNameDescriptionTier
unbounded-consumptionUnbounded ConsumptionResource exhaustionPremium
reasoning-dosReasoning DoSComputational exhaustionPremium
divergent-repetitionDivergent RepetitionTraining data leakagePremium

Default Attack Mappings

Each vulnerability has default attacks that are most effective:
# Example: prompt-extraction vulnerability
default_attacks = [
    "prompt-probing",     # Direct probing questions
    "system-override",    # Override commands
    "gray-box",          # Fake internal context
    "base64",            # Encoded requests
    "rot13"              # Obfuscated requests
]

Accessing the Catalog

from rogue.server.red_teaming.catalog.vulnerabilities import (
    get_vulnerability,
    get_all_vulnerabilities,
    get_vulnerabilities_by_category,
    get_free_vulnerabilities,
    get_premium_vulnerabilities,
    get_basic_scan_vulnerabilities,
    get_full_scan_vulnerabilities
)

# Get a specific vulnerability
vuln = get_vulnerability("prompt-extraction")
print(f"{vuln.name}: {vuln.description}")

# Get all free vulnerabilities
free_vulns = get_free_vulnerabilities()

# Get vulnerabilities for basic scan
basic_vulns = get_basic_scan_vulnerabilities()

Vulnerability Definition Structure

@dataclass
class VulnerabilityDef:
    id: str                           # Unique identifier
    name: str                         # Display name
    category: VulnerabilityCategory   # Category grouping
    description: str                  # Detailed description
    default_attacks: List[str]        # Recommended attack IDs
    premium: bool                     # Requires API key