Evaluations
Evaluations are the fundamental building block for ensuring the integrity of your AI agent’s behavior. Each evaluation acts as a specialized check tailored to catch specific issues before they reach your users.Evaluation Modes
Each evaluation can run in different modes, letting you choose the right balance for your use case:Speed
~20ms latency. Simple pass/fail with the fastest possible response. Best for real-time guardrails and high-throughput systems.
Balanced
~100ms latency. Includes reasoning explanations while maintaining good performance. Best for production use with explanations.
Quality
~500ms latency. Uses larger models for the most thorough analysis. Best for detailed analysis, debugging, and experiments.
Evaluation Categories
Security
Security
Protect your AI system from attacks and prevent sensitive data exposure.
Includes prompt injection detection and PII scanning.
Safety
Safety
Ensure your AI produces appropriate, non-harmful content across multiple
safety categories including dangerous content, harassment, and hate speech.
Reliability
Reliability
Verify that your AI produces accurate, high-quality outputs. Includes
hallucination detection, context grounding, and tool selection quality.
Policy
Policy
Enforce your custom rules and guardrails using natural language assertions.
Define any policy and have it consistently enforced.
Topic Scoping
Topic Scoping
Ensure your AI stays on-topic by defining allowed topics. Detects when
conversations drift outside the intended scope of your application.
Qualifire’s Small Language Models (SLMs) Judges
Qualifire employs a suite of fine-tuned, state-of-the-art Small Language Models (SLMs), each specialized for a specific evaluation task. This provides faster, more accurate, and more targeted analysis of agent behavior.Sentinel - Prompt Injection Detection
Sentinel - Prompt Injection Detection
Detects prompt injection and jailbreak attempts that try to manipulate your
AI into ignoring its instructions or behaving maliciously.Results:
BENIGN— Input is safeINJECTION— Attack attempt detected
Cleric - Content Safety Moderation
Cleric - Content Safety Moderation
Evaluates content for harmful or inappropriate material across multiple safety categories.
Results:
| Category | Description |
|---|---|
| Dangerous Content | Violence instructions, self-harm, harmful activities |
| Harassment | Bullying, abuse, targeting individuals or groups |
| Sexually Explicit | Adult content, non-consensual sexual content |
| Hate Speech | Discrimination, incitement against protected groups |
SAFE— Content passes all safety checksUNSAFE— Harmful content detected (includes which categories were triggered)
Paladin - Context Grounding
Paladin - Context Grounding
Verifies that responses are properly anchored in your provided reference material.
Ensures claims are supported by source documents or the system prompt.Configuration:
- Single-turn: Evaluates against the system prompt only
- Multi-turn: Evaluates against the full conversation history
GROUNDED— Response is supported by the contextUNGROUNDED— Response makes claims not found in context
Ranger - Tool Selection Quality (TSQ)
Ranger - Tool Selection Quality (TSQ)
Evaluates whether your AI agent correctly selects and calls tools/functions.
Catches wrong tool selection, invalid parameters, and incorrect parameter values.Results:
VALID_CALL— Tool call is correctTOOL_ERROR— Wrong tool was selectedPARAM_NAME_ERROR— Invalid parameter name usedPARAM_VALUE_ERROR— Parameter value is incorrect
Magistrate - Policy Enforcement
Magistrate - Policy Enforcement
Evaluates whether content complies with your custom-defined policies and guardrails.
Define any rule in natural language and enforce it consistently.Example assertions:
- “Response must not provide medical advice”
- “Always recommend consulting a professional for legal matters”
- “Never disclose internal pricing information”
- “Responses should be in a professional tone”
- Target: Choose what to evaluate
input— Check only the user’s messageoutput— Check only the AI’s responseboth— Check the entire conversation
COMPLIES— Content follows the policyWARNING— Potential concern (borderline case)VIOLATES— Content breaks the policy
Sage - Hallucination Detection
Sage - Hallucination Detection
Identifies when your AI generates information that isn’t supported by the
provided context. Catches fabricated facts, invented details, and unfaithful responses.Results:
NOT_HALLUCINATED— Response is faithful to the contextHALLUCINATED— Response contains unsupported claims
Hunter - PII Detection
Hunter - PII Detection
Scans content for Personally Identifiable Information to prevent data leaks
and ensure privacy compliance.Detected categories include:
- Personal identifiers (name, date of birth, address)
- Financial data (credit card, bank account, SSN)
- Government IDs (passport, driver’s license, national ID)
- Contact information (phone, email, IP address)
- Healthcare data (health insurance ID)
NO_PII_FOUND— Content is cleanPII_FOUND— Sensitive data detected (includes the specific type and location)
Combining Evaluations
You can run multiple evaluations simultaneously. The overall result passes only if all individual evaluations pass, giving you comprehensive coverage in a single check.Bypass Behavior
When an evaluation can’t run due to missing requirements (e.g., no AI response yet for hallucination detection), it automatically bypasses with a pass result. This prevents evaluations from blocking your application when they don’t apply to the current context.
For code examples showing how to run evaluations, see the SDK documentation.