Evaluations

Evaluation is the fundamental building block for ensuring the integrity of your content. Think of it as a comprehensive test tailored to your specific needs. Familiarizing yourself with Evaluation is crucial, as many other features rely on or integrate with it.

Creating a new Evaluation

The process involves selecting one or more ‘checks’ – criteria you want to examine. The available checks are:

Security checks

Reliability checks

Assertions

Syntax Check

You have the flexibility to create Evaluations manually, choose a template from our library, or simply let our AI assistant handle it for you. The choice is yours!
On the Evaluations page, you can manage all your created Evaluations with ease (edit, duplicate or delete them if necessary).

An Evaluation on its own is just a set of checks, you can harness those checks in many ways like, experiments for regression tests, while using the prompt management as part of the prompt engineering phase and of course as a part of you active protection.

Qualifire’s Small Language Models (SLMs) Judges

Qualifire employs a suite of fine-tuned, state-of-the-art Small Language Models (SLMs), each specialized for a specific evaluation task. This provides faster, more accurate, and more targeted analysis of agent behavior. We offer to integrate the Qualifire state-of-the-art evaluations to your existing stack, supercharging your evaluation framework.

Sentinel - Prompt injections

Cleric - Content safety moderation

Cleric is designed for content safety moderation, diligently filtering out inappropriate or harmful content to maintain a secure environment.
F1 Score: 0.946 | Latency: 35 ms

Category	Description
Dangerous Content	Promotes/facilitates harmful activities, self-harm, violence instructions
Harassment	Abusive behavior, bullying, targeting individuals/groups
Sexually Explicit Information	Depicts sexual acts, non-consensual sexual content, adult content
Hate Speech	Promotes violence, incites hatred, discrimination against protected groups

Paladin - Context grounding

Ranger - AI Agents TSQ (Tool selection quality)

Magistrate - Standards Enforcement

Sage - Hallucination detection

Hunter - PII detector

Get Started

Evaluations

Guardrails

Observability

Prompt Management

Integrations

Evaluations

Evaluations

Creating a new Evaluation

Qualifire’s Small Language Models (SLMs) Judges

Get Started

Evaluations

Guardrails

Observability

Prompt Management

Integrations

​Evaluations

​Creating a new Evaluation

​Qualifire’s Small Language Models (SLMs) Judges

Evaluations

Creating a new Evaluation

Qualifire’s Small Language Models (SLMs) Judges