An introduction to the Qualifire evaluations
Security checks
Reliability checks
Assertions
Syntax Check
Sentinel - Prompt injections
Cleric - Content safety moderation
Category | Description |
---|---|
Dangerous Content | Promotes/facilitates harmful activities, self-harm, violence instructions |
Harassment | Abusive behavior, bullying, targeting individuals/groups |
Sexually Explicit Information | Depicts sexual acts, non-consensual sexual content, adult content |
Hate Speech | Promotes violence, incites hatred, discrimination against protected groups |
Paladin - Context grounding
Ranger - AI Agents TSQ (Tool selection quality)
Magistrate - Standards Enforcement
Sage - Hallucination detection
Hunter - PII detector