Why Small Language Models?
General-purpose LLMs are expensive, slow, and not optimized for evaluation tasks. Qualifire’s SLM judges solve this by providing purpose-built models that are fine-tuned for specific evaluation tasks — delivering higher accuracy at a fraction of the cost and latency.99.6% Faster
~100ms latency vs seconds for general-purpose LLMs
97% Cheaper
1.25–$3.00 for frontier LLMs
Higher Accuracy
Fine-tuned models outperform general-purpose LLMs on targeted evaluation tasks
Omni — Multi-Task Evaluation Model
Omni is Qualifire’s flagship 14B parameter model, capable of handling multiple evaluation tasks in a single inference call. It delivers frontier-model accuracy at SLM speed and cost.| Property | Value |
|---|---|
| Parameters | 14B |
| Latency | ~100ms |
| Cost | $0.01 / 1M tokens |
| Tasks | Prompt Injection Detection, Safety, Grounding, Hallucination Detection, Policy Enforcement, Tool Use Quality, Topic Scoping |
Benchmarks
Omni matches or exceeds the performance of frontier models like GPT-5, Claude Sonnet 4.5, and Gemini 3 Pro across evaluation tasks — at 60x lower latency and 125–300x lower cost.- Prompt Injection
- Hallucination Detection
- Grounding
- Policy Enforcement
- Tool Use Quality
- Topic Scoping
- Safety
Detects prompt injection and jailbreak attempts targeting your AI system.
| Model | Creator | Avg F1 | Latency | Cost/1M tokens |
|---|---|---|---|---|
| Sentinel v2 | Qualifire | 0.957 | ~0.038s | $0.005 |
| Omni | Qualifire | 0.936 | ~0.1s | $0.01 |
| Qwen3Guard 8B | Qwen | 0.882 | ~0.76s | — |
| Qwen3Guard 4B | Qwen | 0.877 | ~0.48s | — |
| Qwen3Guard 0.6B | Qwen | 0.858 | ~0.27s | — |
| GPT OSS Safeguard 20B | OpenAI | 0.803 | ~10s | — |
| Llama Guard 3 8B | Meta | 0.628 | ~0.21s | — |
| Llama Guard 3 1B | Meta | 0.475 | ~0.09s | — |
Specialist Models
In addition to Omni, Qualifire provides fine-tuned specialist models optimized for single tasks where maximum accuracy or minimal latency is required.Sentinel — Prompt Injection Detection
Sentinel — Prompt Injection Detection
Detects prompt injection and jailbreak attempts that try to manipulate your AI into ignoring its instructions.
Benchmark comparison (Prompt Injection):
| Property | Value |
|---|---|
| Avg F1 | 0.957 |
| Latency | ~38ms |
| Parameters | 596M |
| Cost | $0.005 / 1M tokens |
| Model | Creator | Avg F1 | Latency | Cost/1M tokens |
|---|---|---|---|---|
| Sentinel v2 | Qualifire | 0.957 | ~0.038s | $0.005 |
| Qwen3Guard 8B | Qwen | 0.882 | ~0.76s | — |
| Qwen3Guard 4B | Qwen | 0.877 | ~0.48s | — |
| Qwen3Guard 0.6B | Qwen | 0.858 | ~0.27s | — |
| GPT OSS Safeguard 20B | OpenAI | 0.803 | ~10s | — |
| Llama Guard 3 8B | Meta | 0.628 | ~0.21s | — |
Cleric — Content Safety Moderation
Cleric — Content Safety Moderation
Evaluates content for harmful or inappropriate material across multiple safety categories (dangerous content, harassment, hate speech, sexually explicit).
| Property | Value |
|---|---|
| Avg F1 | 0.886 |
| Latency | ~38ms |
| Parameters | 0.6B |
| Cost | $0.01 / 1M tokens |
Paladin — Context Grounding
Paladin — Context Grounding
Verifies that responses are accurately grounded in provided reference material.
Paladin Mini is optimized for speed-critical applications. For higher accuracy, use Omni.
| Property | Value |
|---|---|
| Avg Score | 79.31 |
| Latency | ~64ms |
| Parameters | 3.8B |
| Cost | $0.016 / 1M tokens |
Ranger — Tool Use Quality
Ranger — Tool Use Quality
Evaluates MCP tool selection quality for AI agents — correct tool selection, parameters, and values.
| Property | Value |
|---|---|
| F1 | 0.945 |
| Latency | ~90ms |
| Cost | $0.01 / 1M tokens |
Sage — Hallucination Detection
Sage — Hallucination Detection
Uses reasoning to identify inaccurate outputs and logic faults.
| Property | Value |
|---|---|
| F1 | 0.834 |
| Latency | ~250ms |
| Cost | $0.01 / 1M tokens |
Hunter — PII Detection
Hunter — PII Detection
Identifies and flags personally identifiable information to prevent data leaks.
| Property | Value |
|---|---|
| F1 | 0.834 |
| Latency | ~40ms |
| Cost | $0.01 / 1M tokens |
Magistrate — Policy Enforcement
Magistrate — Policy Enforcement
Enforces custom rules, standards, and policies using natural language assertions.
| Property | Value |
|---|---|
| F1 | 0.835 |
| Latency | ~100ms |
| Cost | $0.01 / 1M tokens |
Deployment Options
Qualifire SLMs can be deployed in the way that fits your infrastructure and compliance requirements.SaaS
Fully managed by Qualifire. No infrastructure to maintain — just send API requests.
Your Cloud
Deploy in your own cloud environment (AWS, GCP, Azure) for data residency and compliance needs.
On-Premise
Run entirely on your infrastructure for maximum control and air-gapped environments.
Qualifire models can be fine-tuned for your specific domain and policies. Contact our team to discuss custom model training for your use case.
Getting Started
Evaluations
Learn how to use these models through Qualifire’s evaluation system
SDK
Integrate SLM judges into your application with the Qualifire SDK